MySQL 5.6/5.7 Select top n records by group
MySQL 8.0 introduced some window functions that made it easier to write queries that many SQL experts have been writing forever in other SQL databases. This makes running one specific type of query – the “greatest n per group” or “most recent n per group” – much easier. The use case is simple: Let’s say you need to select a ranking of the top posts per blog tag, or the most recent products in a shop category, or the most reported comments by a user. More generically, we need a way to select the most x (recent, highest scoring, etc) item as grouped by y (some other dimension).
In MySQL 5.6 and 5.7, without window functions like `ROW_NUMBER()`, we must rely on session variables. Session variables are, surprise, variables that MySQL will store within a session that can be manipulated. In the case of retrieving the top records per group, we’ll use session variables to increment our records according to their `order by`, which we will then use in a `where` to ensure we only retrieve the top x records.
As a real world example, let’s say we need to find the top 10 products per category according to a product_rating field.
SELECT -- These are your final `select` fields, which reference the inner `select` fields below. The fields you want to select here must be in your `select` below. rs.product_id, rs.product_name, rs.product_rating, rs.category_id, rs.category_name, FROM ( SELECT records.* -- These are our session variables. , CASE WHEN @prev = records.category_id THEN @i:=@i+1 ELSE @i:=1 END i -- `@i` is our incrementer and `@prev` is keeping track of which "group" to increment against. , @prev:=records.category_id -- The "per group" aspect of this challenge is controlled by the `@prev` variable. When the SELECT encounters a new retrieved group, it will reset the incrementer back to 1. FROM ( SELECT p.product_id, p.product_name, p.product_rating, c.category_id, c.category_name FROM products p inner join categories_products pc on pc.product_id = p.product_id inner join categories c on c.category_id = pc.category_id -- Any other joins or wheres you need... ORDER BY c.category_id -- The first ORDER BY is our "grouping". We want to retrieve results grouped by category_id. , p.product_rating DESC -- The second order by should be the dimension you want to group by, in this case a product rating. If we wanted to find the top 10 most recent items, we would order by a created_date, for example. ) records -- The name records is arbitrary. It helps me organize the inner select as the "records" and the JOIN ( SELECT @prev:=null,@i:=0) vars -- This initilaizes our session variables ) rs WHERE rs.i <= 10 ORDER BY rs.tag_based_key, rs.i; -- Now we can only retrieve records with a LIMIT on our incrementer `i`.
The above will output the top 10 products per category, ranked by product_rating.