R sqldf包中按类别索引并按列排序

时间:2018-07-23 09:23:23

标签: r sqldf

如何在sqldf包中按列对R中的类别添加索引。我在寻找等效的SQL:

ROW_NUMBER() over(partition by [Category] order by [Date] desc

假设我们有一张桌子:

+----------+-------+------------+
| Category | Value |    Date    |
+----------+-------+------------+
| apples   |     3 | 2018-07-01 |
| apples   |     2 | 2018-07-02 |
| apples   |     1 | 2018-07-03 |
| bananas  |     9 | 2018-07-01 |
| bananas  |     8 | 2018-07-02 |
| bananas  |     7 | 2018-07-03 |
+----------+-------+------------+

期望的结果是:

+----------+-------+------------+-------------------+
| Category | Value |    Date    | Index by category |
+----------+-------+------------+-------------------+
| apples   |     3 | 2018-07-01 |                 3 |
| apples   |     2 | 2018-07-02 |                 2 |
| apples   |     1 | 2018-07-03 |                 1 |
| bananas  |     9 | 2018-07-01 |                 3 |
| bananas  |     8 | 2018-07-02 |                 2 |
| bananas  |     7 | 2018-07-03 |                 1 |
+----------+-------+------------+-------------------+

感谢您在注释中提示如何在不同于sqldf的许多其他软件包中完成此操作:Numbering rows within groups in a data frame

1 个答案:

答案 0 :(得分:2)

1)PostgreSQL 这可以通过PostgreSQL后端到sqldf来完成:

library(RPostgreSQL)
library(sqldf)

sqldf('select *, 
       ROW_NUMBER() over (partition by "Category" order by "Date" desc) as seq
       from "DF"
       order by "Category", "Date" ')

给予:

  Category Value       Date seq
1   apples     3 2018-07-01   3
2   apples     2 2018-07-02   2
3   apples     1 2018-07-03   1
4  bananas     9 2018-07-01   3
5  bananas     8 2018-07-02   2
6  bananas     7 2018-07-03   1

2)SQLite 要使用SQLite后端(默认后端)来执行此操作,我们需要适当地修改SQL语句。在执行此操作之前,请确保未加载RPostgreSQL。我们假设已经根据问题中显示的数据在每个类别中按日期对数据进行了排序,但是如果不是这样,那么扩展SQL使其首先进行排序就足够了。

library(sqldf)

sqldf("select a.*, count(*) seq 
       from DF a left join DF b on a.Category = b.Category and b.rowid >= a.rowid 
       group by a.rowid 
       order by a.Category, a.Date")

注意

可重复形式的输入DF为:

Lines <- "
Category  Value  Date    
apples        3  2018-07-01 
apples        2  2018-07-02 
apples        1  2018-07-03 
bananas       9  2018-07-01 
bananas       8  2018-07-02 
bananas       7  2018-07-03 
"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)