分组,转换为因数并提取整数形式的水平R?

时间:2018-08-19 13:35:09

标签: r dataframe dplyr

我有2个具有相同列(vars)和2个不同用户ID的数据框:

df1:

T

和df2:

structure(list(user_id = c(1, 1, 1, 1, 1, 1), obs_id = c("717b1913-0c0f-4963-8bc9-81a06a3bb1c0", 
"717b1913-0c0f-4963-8bc9-81a06a3bb1c0", "717b1913-0c0f-4963-8bc9-81a06a3bb1c0", 
"717b1913-0c0f-4963-8bc9-81a06a3bb1c0", "717b1913-0c0f-4963-8bc9-81a06a3bb1c0", 
"717b1913-0c0f-4963-8bc9-81a06a3bb1c0"), timestamp = c(337837075445301, 
337837075445301, 337837077455301, 337837077455301, 337837079457301, 
337837079457301), acc_x = c(0.5363176, 0.5363176, 0.5243462, 
0.5243462, 0.5243462, 0.5243462), acc_y = c(6.4693303, 6.4693303, 
6.4693303, 6.4693303, 6.4693303, 6.4693303), acc_z = c(6.8093176, 
6.8093176, 6.821289, 6.821289, 6.821289, 6.821289)), .Names = c("user_id", 
"obs_id", "timestamp", "acc_x", "acc_y", "acc_z"), row.names = c(NA, 
6L), class = "data.frame")

现在我要绑定它们,按structure(list(user_id = c(2, 2, 2, 2, 2, 2), obs_id = c("8027eac3-8839-498e-98b9-3b46da98d1f4", "8027eac3-8839-498e-98b9-3b46da98d1f4", "8027eac3-8839-498e-98b9-3b46da98d1f4", "8027eac3-8839-498e-98b9-3b46da98d1f4", "8027eac3-8839-498e-98b9-3b46da98d1f4", "8027eac3-8839-498e-98b9-3b46da98d1f4"), timestamp = c(336965414272993, 336965414272993, 336965414272993, 336965416627384, 336965418627300, 336965420627376), acc_x = c(-1, -1, -1, 0.81644773, 0.80208206, 0.8140534), acc_y = c(-1, -1, -1, 6.648901, 6.646507, 6.651295 ), acc_z = c(-1, -1, -1, 7.2618356, 7.257047, 7.233104)), .Names = c("user_id", "obs_id", "timestamp", "acc_x", "acc_y", "acc_z"), row.names = c(NA, 6L), class = "data.frame") 分组,依次旋转user_id进行分解并从中提取出一个整数列:

obs_id

返回错误:

  

mutate_impl(.data,点)中的错误:列bind_rows(df1,df2) %>% group_by(user_id) %>% mutate(obs_id = as_factor(obs_id), replicate = as.numeric(levels(obs_id))) 必须是长度   6(组大小)或1,而不是0

请在这里告诉我我在做什么错吗?

我希望将replicate列转换为因数列,将obs_id并将其“编码”为整数,而不是在{{1}中观察到的长levels }。

1 个答案:

答案 0 :(得分:1)

绑定数据集后,将'obs_id'转换为factor,然后执行group_by,因为当我们在factor内转换为group_by时存在冲突级别可以不同。一个更简单的选择是使用{obs_id'的match个元素unique

bind_rows(df1, df2) %>% 
  group_by(user_id) %>% 
  mutate(Rep = match(obs_id, unique(obs_id)))

问题在于在每个具有不同factor的'user_id'中存储一个levels列。如果目标是获取“ Rep”列,则不需要factor中间列

bind_rows(df1, df2) %>% 
     group_by(user_id) %>% 
     mutate(Rep = as.integer(factor(obs_id)))