Just a quick post today about an issue I ran across while doing some work converting stored procedures in SQL Server to stored functions in PostgreSQL.
I ran across a bit of code similar to this:
delete from temp_tbl t left join (select username, max(login_timestamp) as max_time from temp_tbl group by username ) tbl on t.username = tbl.username and t.login_timestamp = tbl.max_time where tbl.username is null;
The point of this code is to update a table that contains users and their login times and delete all but the latest one. In SQL Server it seems that it’s common to use the left join to the table against itself. With a left join a row in the left-side table will always be present even if there isn’t a matching row on the right-side table. In that case a null will be present in the results for the right-side table. So this code is getting the full list of times, matching up against the max time, and deleting any other rows (because they will have null).
Unfortunately at the moment (as of PostgreSQL v11 which is the latest version at the time of this writing) PostgreSQL doesn’t support using a left join in the delete statement. So how do we do it?
Using NOT IN and a sub select would accomplish the same thing but the performance on that isn’t as good as what a left join could do. The best option is the NOT EXISTS clause. So our code becomes this in PostgreSQL:
delete from temp_tbl t where not exists ( select * from (select username, max(login_timestamp) as max_time from temp_tbl group by username) tbl where t.username = tbl.username and t.login_time = tbl.max_time);
I think the code is obvious here but it just grabs the max time for each user. The outer SQL will delete anything that isn’t returned in the inner select. Ultimately we get the same result in this code that we get from the example SQL Server code above, and more importantly I think it’s clearer than the left join version.