我有2个具有相同列(vars)和2个不同用户ID的数据框:
df1:
T
和df2:
structure(list(user_id = c(1, 1, 1, 1, 1, 1), obs_id = c("717b1913-0c0f-4963-8bc9-81a06a3bb1c0",
"717b1913-0c0f-4963-8bc9-81a06a3bb1c0", "717b1913-0c0f-4963-8bc9-81a06a3bb1c0",
"717b1913-0c0f-4963-8bc9-81a06a3bb1c0", "717b1913-0c0f-4963-8bc9-81a06a3bb1c0",
"717b1913-0c0f-4963-8bc9-81a06a3bb1c0"), timestamp = c(337837075445301,
337837075445301, 337837077455301, 337837077455301, 337837079457301,
337837079457301), acc_x = c(0.5363176, 0.5363176, 0.5243462,
0.5243462, 0.5243462, 0.5243462), acc_y = c(6.4693303, 6.4693303,
6.4693303, 6.4693303, 6.4693303, 6.4693303), acc_z = c(6.8093176,
6.8093176, 6.821289, 6.821289, 6.821289, 6.821289)), .Names = c("user_id",
"obs_id", "timestamp", "acc_x", "acc_y", "acc_z"), row.names = c(NA,
6L), class = "data.frame")
现在我要绑定它们,按structure(list(user_id = c(2, 2, 2, 2, 2, 2), obs_id = c("8027eac3-8839-498e-98b9-3b46da98d1f4",
"8027eac3-8839-498e-98b9-3b46da98d1f4", "8027eac3-8839-498e-98b9-3b46da98d1f4",
"8027eac3-8839-498e-98b9-3b46da98d1f4", "8027eac3-8839-498e-98b9-3b46da98d1f4",
"8027eac3-8839-498e-98b9-3b46da98d1f4"), timestamp = c(336965414272993,
336965414272993, 336965414272993, 336965416627384, 336965418627300,
336965420627376), acc_x = c(-1, -1, -1, 0.81644773, 0.80208206,
0.8140534), acc_y = c(-1, -1, -1, 6.648901, 6.646507, 6.651295
), acc_z = c(-1, -1, -1, 7.2618356, 7.257047, 7.233104)), .Names = c("user_id",
"obs_id", "timestamp", "acc_x", "acc_y", "acc_z"), row.names = c(NA,
6L), class = "data.frame")
分组,依次旋转user_id
进行分解并从中提取出一个整数列:
obs_id
返回错误:
mutate_impl(.data,点)中的错误:列
bind_rows(df1,df2) %>% group_by(user_id) %>% mutate(obs_id = as_factor(obs_id), replicate = as.numeric(levels(obs_id)))
必须是长度 6(组大小)或1,而不是0
请在这里告诉我我在做什么错吗?
我希望将replicate
列转换为因数列,将obs_id
并将其“编码”为整数,而不是在{{1}中观察到的长levels
}。
答案 0 :(得分:1)
绑定数据集后,将'obs_id'转换为factor
,然后执行group_by
,因为当我们在factor
内转换为group_by
时存在冲突级别可以不同。一个更简单的选择是使用{obs_id'的match
个元素unique
bind_rows(df1, df2) %>%
group_by(user_id) %>%
mutate(Rep = match(obs_id, unique(obs_id)))
问题在于在每个具有不同factor
的'user_id'中存储一个levels
列。如果目标是获取“ Rep”列,则不需要factor
中间列
bind_rows(df1, df2) %>%
group_by(user_id) %>%
mutate(Rep = as.integer(factor(obs_id)))