본문 바로가기

Program/DB

MySQL ROW_NUMBER()

MS-SQL 2005 부터 생긴 ROW_NUMBER() 함수와 비슷한 기능을 mysql에서 구현한 것입니다.


출처 : http://www.explodybits.com/2011/11/mysql-row-number/


Have you ever been in a situation where you are selecting records from a database that need to be ranked, but the column(s) you’re attempting to ORDER BY are not unique. For example, using the Orders table below, how could you display a distinct list of customers and their last order? The problem is that the date field contains only the date, but not the time. Therefore, it’s possible that two orders can be placed by the same customer on the same day. Now if we had a field defined as AUTO_INCREMENT in MySQL or IDENTITY in Microsoft SQL Server and the records were entered sequentially, this would be a simple task.

Orders Table

Example:
CustomerOrderDateAmount
Jane2011-01-0512
Jane2011-01-0715
Jane2011-01-0717
John2011-01-0111
John2011-01-0227
John2011-01-0213
Pat2011-02-055
Pat2011-02-0734
Pat2011-02-0712

This can be solved in MS SQL , and with a little more code in MySQL , as well. In MS SQL Server 2005+, uniquely identifying the above records is a breeze using the ROW_NUMBER() function. If your not familiar withROW_NUMBER(), MSDN defines theT-SQL ROW_NUMBER()as“Returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.”

So, in order to display a distinct list of customers and uniquely identify their last order, we could write something like:

SELECT  ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY OrderDate DESC) AS RowNumber
       ,Customer
       ,OrderDate
       ,Amount
  FROM Orders
Results:
RowNumberCustomerOrderDateAmount
1Jane2011-01-0715
2Jane2011-01-0717
3Jane2011-01-0512
1John2011-01-0227
2John2011-01-0213
3John2011-01-0111
1Pat2011-02-0734
2Pat2011-02-0712
3Pat2011-02-055

Notice how a unique row number is now apparent on each row within the partition. The next step would be to encompasses the statement in a sub query or Common Table Expression (CTE), and filter out the unwanted records based on the generated row number.

SELECT  Customer
       ,OrderDate
       ,Amount
  FROM
      (
        SELECT  ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY OrderDate DESC) AS RowNumber
               ,Customer
               ,OrderDate
               ,Amount
          FROM Orders
      ) subquery WHERE RowNumber = 1
Results:
CustomerOrderDateAmount
Jane2011-01-0715
John2011-01-0227
Pat2011-02-0734

The result is a single record for each customer, even when the customer has more than one order on the same day.


MySQL Implementation

Recently, I ran into a similar situation on a WordPress implementation, but with Horses not Orders/Customers. The requirement was to display each horse’s last workout and next race. MySQL does not have a ROW_NUMBER()function. However, MySQL does allow for inline assignment of variables and the ability to reassign and reference those variables as the query is working its way through the execution. This allows the same functionality thatROW_NUMBER() provides to be achieved in MySQL. Sticking to the same example used above the MySQL solution would be.

SELECT  @row_num := IF(@prev_value=o.Customer,@row_num+1,1) AS RowNumber
       ,o.Customer
       ,o.OrderDate
       ,o.Amount
       ,@prev_value := o.Customer
  FROM Orders o,
      (SELECT @row_num := 1) x,
      (SELECT @prev_value := '') y
  ORDER BY o.Customer, o.OrderDate DESC
Results:
RowNumberCustomerOrderDateAmount
1Jane2011-01-0715
2Jane2011-01-0717
3Jane2011-01-0512
1John2011-01-0227
2John2011-01-0213
3John2011-01-0111
1Pat2011-02-0734
2Pat2011-02-0712
3Pat2011-02-055

A unique row number is now apparent on each row with in the partition. The @row_num variable holds the current row number and the @pev_value variable holds the current value of the partition by field. The variables are defined and assigned a default value with in subqueries. The Orders table and the subqueries are then combined in a single select statement. The @row_num variable is incremented by one until the @prev_value does not equal the Customer and is then reset back to one.

Important

  • The @row_num variable must be set before the @prev_value variable
  • The first field in the ORDER BY must be the field that you are partitioning by
  • The default value assigned to the @prev_value variable must not exist in the partition by field

As we did with the MS SQL ROW_NUMBER() example, we will need to encompass the statement in a sub query in order to filter based on the generated row number.

SELECT  Customer
       ,OrderDate
       ,Amount
  FROM
     (
      SELECT  @row_num := IF(@prev_value=o.Customer,@row_num+1,1) AS RowNumber
             ,o.Customer 
             ,o.OrderDate
             ,o.Amount
             ,@prev_value := o.Customer
        FROM Orders o,
             (SELECT @row_num := 1) x,
             (SELECT @prev_value := '') y
       ORDER BY o.Customer, o.OrderDate DESC
     ) subquery
 WHERE RowNumber = 1
Results:
CustomerOrderDateAmount
Jane2011-01-0715
John2011-01-0227
Pat2011-02-0734

The result is a single record for each customer, even when the customer had more than one order on the same day.

The best solution is to avoid being in this situation in the first place. However an existing data schema does not always lend itself to new requirements.