以下是两个小tbl_df
个对象:第一个df
,具有与某些交易相关联的数字客户ID;第二个,cohort
,是一个单列对象,其客户ID在我的函数中的某个时刻需要识别并保留:
> df
Source: local data frame [7 x 4]
cust date sales cohort
1 12 2013-07-31 35 2013-07-01
2 13 2013-12-16 70 2013-12-01
3 14 2014-03-14 59 2014-03-01
4 15 2014-04-22 70 2014-04-01
5 9 2012-10-29 35 2012-10-01
6 10 2012-11-12 35 2012-11-01
7 11 2012-12-06 105 2012-12-01
> cohort
Source: local data frame [1 x 1]
cust
1 9
> str(df)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 7 obs. of 4 variables:
$ cust : num 12 13 14 15 9 10 11
$ date : POSIXct, format: "2013-07-31" "2013-12-16" "2014-03-14" ...
$ sales : num 35 70 59 70 35 35 105
$ cohort: Date, format: "2013-07-01" "2013-12-01" "2014-03-01" ...
> str(cohort)
Classes ‘tbl_df’ and 'data.frame': 1 obs. of 1 variable:
$ cust: num 9
为此,我想我会使用dplyr::filter()
,如下所示:
> filter(data.frame(df), cust %in% cohort[['cust']])
Error: subscript out of bounds
这一点特别奇怪,因为修复似乎很简单:
> foo <- cohort
> filter(data.frame(df), cust %in% foo[['cust']])
cust date sales cohort
1 9 2012-10-29 35 2012-10-01
令我感到困惑的是foo
在cohort
没有的地方工作。它们是相同的对象:
> str(cohort)
Classes ‘tbl_df’ and 'data.frame': 1 obs. of 1 variable:
$ cust: num 9
> str(foo)
Classes ‘tbl_df’ and 'data.frame': 1 obs. of 1 variable:
$ cust: num 9
>
有人有解释吗?
我在dplyr 0.4.2
上运行R 3.2.1
。如果你想在你的机器上重现这个,那么这里是dput()
:
> dput(df)
structure(list(cust = c(12, 13, 14, 15, 9, 10, 11), date = structure(c(1375228800,
1387152000, 1394755200, 1398124800, 1351468800, 1352678400, 1354752000
), tzone = "UTC", class = c("POSIXct", "POSIXt")), sales = c(35,
70, 59, 70, 35, 35, 105), cohort = structure(c(15887, 16040,
16130, 16161, 15614, 15645, 15675), class = "Date")), .Names = c("cust",
"date", "sales", "cohort"), row.names = c(NA, 7L), class = c("tbl_df",
"tbl", "data.frame"))
> dput(cohort)
structure(list(cust = 9), .Names = "cust", class = c("tbl_df",
"data.frame"), row.names = c(NA, -1L))