Question

以下是两个小tbl_df个对象：第一个df，具有与某些交易相关联的数字客户ID;第二个，cohort，是一个单列对象，其客户ID在我的函数中的某个时刻需要识别并保留：

> df
Source: local data frame [7 x 4]

  cust       date sales     cohort
1   12 2013-07-31    35 2013-07-01
2   13 2013-12-16    70 2013-12-01
3   14 2014-03-14    59 2014-03-01
4   15 2014-04-22    70 2014-04-01
5    9 2012-10-29    35 2012-10-01
6   10 2012-11-12    35 2012-11-01
7   11 2012-12-06   105 2012-12-01
> cohort
Source: local data frame [1 x 1]

  cust
1    9
> str(df)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   7 obs. of  4 variables:
 $ cust  : num  12 13 14 15 9 10 11
 $ date  : POSIXct, format: "2013-07-31" "2013-12-16" "2014-03-14" ...
 $ sales : num  35 70 59 70 35 35 105
 $ cohort: Date, format: "2013-07-01" "2013-12-01" "2014-03-01" ...
> str(cohort)
Classes ‘tbl_df’ and 'data.frame':  1 obs. of  1 variable:
 $ cust: num 9

为此，我想我会使用dplyr::filter()，如下所示：

> filter(data.frame(df), cust %in% cohort[['cust']])
Error: subscript out of bounds

这一点特别奇怪，因为修复似乎很简单：

> foo <- cohort
> filter(data.frame(df), cust %in% foo[['cust']])
  cust       date sales     cohort
1    9 2012-10-29    35 2012-10-01

令我感到困惑的是foo在cohort没有的地方工作。它们是相同的对象：

> str(cohort)
Classes ‘tbl_df’ and 'data.frame':  1 obs. of  1 variable:
 $ cust: num 9
> str(foo)
Classes ‘tbl_df’ and 'data.frame':  1 obs. of  1 variable:
 $ cust: num 9
>

有人有解释吗？

我在dplyr 0.4.2上运行R 3.2.1。如果你想在你的机器上重现这个，那么这里是dput()：

> dput(df)
structure(list(cust = c(12, 13, 14, 15, 9, 10, 11), date = structure(c(1375228800, 
1387152000, 1394755200, 1398124800, 1351468800, 1352678400, 1354752000
), tzone = "UTC", class = c("POSIXct", "POSIXt")), sales = c(35, 
70, 59, 70, 35, 35, 105), cohort = structure(c(15887, 16040, 
16130, 16161, 15614, 15645, 15675), class = "Date")), .Names = c("cust", 
"date", "sales", "cohort"), row.names = c(NA, 7L), class = c("tbl_df", 
"tbl", "data.frame"))
> dput(cohort)
structure(list(cust = 9), .Names = "cust", class = c("tbl_df", 
"data.frame"), row.names = c(NA, -1L))

dplyr :: filter（）中无法解释的“下标越界”错误

0 个答案: