当我使用dplyr创建一列计数时,它似乎被正确填充,直到我尝试自己使用计数列。 例: 我创建了这个数据帧:
deleteItem
然后,我使用dplyr创建一个计数列:
V1 <- c("TEST", "test", "tEsT", "tesT", "TesTing", "testing","ME-TESTED", "re tested", "RE testing")
V2 <- c("othertest", "anothertest", "testing", "123", "random stuff", "irrelevant", "tested", "re-test", "tests")
V3 <- c("type1", "type2", "type1", "type2", "type3", "type2", "type2", "type2", "type1")
df <- data.frame(V1, V2, V3)
这给出了预期的结果:
df$counts <- df %>% group_by(V3) %>% mutate(count = n())
但是,当我尝试以任何方式使用counts.count列时,结果为null:
> df
V1 V2 V3 counts.V1 counts.V2 counts.V3 counts.count
1 TEST othertest type1 TEST othertest type1 3
2 test anothertest type2 test anothertest type2 5
3 tEsT testing type1 tEsT testing type1 3
4 tesT 123 type2 tesT 123 type2 5
5 TesTing random stuff type3 TesTing random stuff type3 1
6 testing irrelevant type2 testing irrelevant type2 5
7 ME-TESTED tested type2 ME-TESTED tested type2 5
8 re tested re-test type2 re tested re-test type2 5
9 RE testing tests type1 RE testing tests type1 3
dplyr创建的其他列的结果相同。 但其余的数据框似乎正常:
> df$counts.count
NULL
我完全不清楚为什么打印整个df给我的输出与打印感兴趣的列不同。我在这里缺少什么?
答案 0 :(得分:1)
如果您回放并重新创建数据框,然后不进行任务,只是将结果打印到屏幕上,您会看到:
A←C→D
如果您现在进行了分配,结构会相当混乱,如果V1或V2的唯一值较少,我认为您可能会收到更多信息错误:
df %>% group_by(V3) %>% mutate(count = n())
Source: local data frame [9 x 4]
Groups: V3 [3]
V1 V2 V3 count
<fctr> <fctr> <fctr> <int>
1 TEST othertest type1 3
2 test anothertest type2 5
3 tEsT testing type1 3
4 tesT 123 type2 5
5 TesTing random stuff type3 1
6 testing irrelevant type2 5
7 ME-TESTED tested type2 5
8 re tested re-test type2 5
9 RE testing tests type1 3
您看到的格式是R如何显示嵌入数据框中的矩阵。类df$counts <- df %>% group_by(V3) %>% mutate(count = n())
# snipped what you already showed
str(df)
#-----
'data.frame': 9 obs. of 4 variables:
$ V1 : Factor w/ 9 levels "ME-TESTED","re tested",..: 7 4 6 5 9 8 1 2 3
$ V2 : Factor w/ 9 levels "123","anothertest",..: 4 2 8 1 5 3 7 6 9
$ V3 : Factor w/ 3 levels "type1","type2",..: 1 2 1 2 3 2 2 2 1
$ counts:Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 9 obs. of 4 variables:
..$ V1 : Factor w/ 9 levels "ME-TESTED","re tested",..: 7 4 6 5 9 8 1 2 3
..$ V2 : Factor w/ 9 levels "123","anothertest",..: 4 2 8 1 5 3 7 6 9
..$ V3 : Factor w/ 3 levels "type1","type2",..: 1 2 1 2 3 2 2 2 1
..$ count: int 3 5 3 5 1 5 5 5 3
..- attr(*, "vars")=List of 1
.. ..$ : symbol V3
..- attr(*, "labels")='data.frame': 3 obs. of 1 variable:
.. ..$ V3: Factor w/ 3 levels "type1","type2",..: 1 2 3
.. ..- attr(*, "vars")=List of 1
.. .. ..$ : symbol V3
.. ..- attr(*, "drop")= logi TRUE
..- attr(*, "indices")=List of 3
.. ..$ : int 0 2 8
.. ..$ : int 1 3 5 6 7
.. ..$ : int 4
..- attr(*, "drop")= logi TRUE
..- attr(*, "group_sizes")= int 3 5 1
..- attr(*, "biggest_group_size")= int 5
(可能还有table
?)的对象继承自tbl
- 类。