如何在sqldf
包中按列对R中的类别添加索引。我在寻找等效的SQL:
ROW_NUMBER() over(partition by [Category] order by [Date] desc
假设我们有一张桌子:
+----------+-------+------------+
| Category | Value | Date |
+----------+-------+------------+
| apples | 3 | 2018-07-01 |
| apples | 2 | 2018-07-02 |
| apples | 1 | 2018-07-03 |
| bananas | 9 | 2018-07-01 |
| bananas | 8 | 2018-07-02 |
| bananas | 7 | 2018-07-03 |
+----------+-------+------------+
期望的结果是:
+----------+-------+------------+-------------------+
| Category | Value | Date | Index by category |
+----------+-------+------------+-------------------+
| apples | 3 | 2018-07-01 | 3 |
| apples | 2 | 2018-07-02 | 2 |
| apples | 1 | 2018-07-03 | 1 |
| bananas | 9 | 2018-07-01 | 3 |
| bananas | 8 | 2018-07-02 | 2 |
| bananas | 7 | 2018-07-03 | 1 |
+----------+-------+------------+-------------------+
感谢您在注释中提示如何在不同于sqldf的许多其他软件包中完成此操作:Numbering rows within groups in a data frame
答案 0 :(得分:2)
1)PostgreSQL 这可以通过PostgreSQL后端到sqldf来完成:
library(RPostgreSQL)
library(sqldf)
sqldf('select *,
ROW_NUMBER() over (partition by "Category" order by "Date" desc) as seq
from "DF"
order by "Category", "Date" ')
给予:
Category Value Date seq
1 apples 3 2018-07-01 3
2 apples 2 2018-07-02 2
3 apples 1 2018-07-03 1
4 bananas 9 2018-07-01 3
5 bananas 8 2018-07-02 2
6 bananas 7 2018-07-03 1
2)SQLite 要使用SQLite后端(默认后端)来执行此操作,我们需要适当地修改SQL语句。在执行此操作之前,请确保未加载RPostgreSQL。我们假设已经根据问题中显示的数据在每个类别中按日期对数据进行了排序,但是如果不是这样,那么扩展SQL使其首先进行排序就足够了。
library(sqldf)
sqldf("select a.*, count(*) seq
from DF a left join DF b on a.Category = b.Category and b.rowid >= a.rowid
group by a.rowid
order by a.Category, a.Date")
可重复形式的输入DF
为:
Lines <- "
Category Value Date
apples 3 2018-07-01
apples 2 2018-07-02
apples 1 2018-07-03
bananas 9 2018-07-01
bananas 8 2018-07-02
bananas 7 2018-07-03
"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)