janewilliams20: (Default)
[personal profile] janewilliams20
Of techy interest only - I'll cut-tag for the rest of you.

A problem I recently encountered on an old table and some old code - it was looking something up and expecting a unique answer, because the record in the lookup table had been accidentally entered twice.

At this point every blog post I've ever met wastes a lot of space on telling you that the table shouldn't have been allowed to get into that state, and telling us about primary and unique keys as if they were a new idea to us. Well, duh. Mind going back in time and telling whoever it was who made the error? Then we have to waste even more time on "maybe the records aren't really duplicates, maybe there's a numerical ID you haven't looked at?" Again, duh. No, the problem I've trying to solve, the interesting one, is where all data in all columns is identical, there is no simple WHERE clause that will pick out one but not the other.

How one handles this varies from one database server to another. Oracle had a rather nice pseudo-column long before anyone else. SQL Server? I found two methods, one using the command syntax to say "just delete the first one you find", another creating a pseudo-column.

This page gives the most detailed explanation I've found so far


Command syntax

SET ROWCOUNT 1
DELETE FROM mytable WHERE nonuniqueid = 'duplicate row'
SET ROWCOUNT 0

Yes, as simple as that. Tell it to only process one record. Do your deletion. Set the ROWCOUNT back to normal (no limit). I've seen this recommended a lot.

However, as the MS documentation says, this won't keep working in up-to-date SQL Server. " Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server." It recommends looking at TOP instead.

DELETE TOP (1)
FROM mytable
WHERE nonuniqueid = 'duplicate row'
Pseudocolumn

A little thing called "row_number()". It's got much wider uses than this

select nonuniqueid, datacols, row_number() OVER(partition by nonuniqueid order by nonuniqueid) AS nr
FROM mytable
WHERE nonuniqueid = 'duplicate row'
That gives you a nice sequential row number for the query. The order by is rather more use in other contexts, but this can now be expanded to

WITH numbered AS (
select nonuniqueid, datacols, row_number() OVER(partition by nonuniqueid order by nonuniqueid) AS nr
FROM mytable
WHERE nonuniqueid = 'duplicate row'
)
select * from numbered
where nr > 1

and when you're happy that's returning what you want,

WITH numbered AS (
select nonuniqueid, datacols, row_number() OVER(partition by nonuniqueid order by nonuniqueid) AS nr
FROM mytable
WHERE nonuniqueid = 'duplicate row'
)
delete from numbered
where nr > 1

Profile

janewilliams20: (Default)
janewilliams20

June 2020

S M T W T F S
 123456
78910111213
14151617181920
21222324252627
282930    

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 6th, 2025 10:59 am
Powered by Dreamwidth Studios