Recipe
Deduplicating Records in Redshift
Remove duplicate ids from any BI Connector table using ROW_NUMBER().
Tables
usercourses, quizattempts, userlearningpaths, userbundles
Tags
TL;DR
Redshift does not enforce primary keys. Wrap any BI Connector table in a ROW_NUMBER() OVER (PARTITION BY id ORDER BY updatedat DESC) CTE and keep row_num = 1.
Problem
High-volume tables (usercourses, quizattempts, userlearningpaths, userbundles) can contain more than one row with the same id. Aggregations like COUNT(*), SUM(...), and joins on id then silently over-count.
| usercourses.id | userid | status | updatedat |
|---|---|---|---|
| abc-123 | u1 | started | 2026-04-01 10:00:00 |
| abc-123 | u1 | completed | 2026-04-02 12:00:00 |
| def-456 | u2 | started | 2026-05-01 09:00:00 |
Solution
How it works
PARTITION BY idgroups rows by their (non-unique) primary key.ORDER BY updatedat DESCputs the freshest write first within each group.WHERE row_num = 1keeps only that freshest row.
Variations
- **Different "winner":** change
ORDER BYtograde DESC(best score),createdat ASC(earliest version), etc. - **Different uniqueness key:** if you want one row per
(userid, courseid)instead of perid, changePARTITION BY idtoPARTITION BY userid, courseid.
Performance
On usercourses (~112M rows) the window function scans the full table. Restrict the CTE with WHERE statusgroup = 'a' or a date filter on updatedat first when possible.
When NOT to use this
Don't apply this blindly to tables with intentionally multi-row keys (e.g. coursecustomfields has many rows per courseid on purpose — that's the EAV shape). Dedup on the table's natural business key, not always id.
Related recipes
Filtering to Active Enrollments Only
Exclude archived, inactive, and deleted enrollments from usercourses using statusgroup.
Handling Clamped 9999 Timestamps
Normalize the 9999-12-01 sentinel that the BI Connector uses for "no expiry" values.
Latest (or Highest-Scoring) Quiz Attempt per Learner
Reduce the quizattempts table to one canonical row per learner per quiz — either the most recent or the highest scoring.