通过一组特定的索引通过dplyr进行变异

时间:2016-06-21 21:44:43

标签: r

我正在处理一个遵循以下模式的数据框:

PowerItemPickupRequirement

这是一个存在模式的粗略数据集 - 在此示例中,Key Data Loc Place1 Value1 6 Value2 7 Loc Place2 Value3 8 Loc Place3 Value1 9 Value2 10 Loc Place4 Value3 11 中的行将标识观察的第一个位置。我的目标是将这些职位的关键字调整为不同,例如sep(1,100,by=5)而不是LocA,以便Loc为我提供可用于进一步分析的独特观察结果:< / p>

spread(key,value)

我一直在使用LocA Value1 Value2 Loc Value3 Place1 6 7 Place2 8 Place3 9 10 Place4 11 和一系列其他变异并选择到达这一点,所以我希望留在链中。我可以看到如何通过链外的适当子集来实现它,但我很难绕着dplyr解决方案。

2 个答案:

答案 0 :(得分:1)

您的数据:

df <- structure(list(Key = c("Loc", "Value1", "Value2", "Loc", "Value3", 
"Loc", "Value1", "Value2", "Loc", "Value3"), Data = c("Place1", 
"6", "7", "Place2", "8", "Place3", "9", "10", "Place4", "11")), .Names = c("Key", 
"Data"), row.names = c(NA, -10L), class = "data.frame")

这可行吗?

library(dplyr)
library(tidyr)
df %>%
  mutate(grp = (row_number() - 1) %/% 5) %>%
  group_by(grp) %>%
  mutate(
    Key = ifelse(! duplicated(Key), Key, paste0(Key, "A"))
  ) %>%
  ungroup() %>%
  spread(Key, Data) %>%
  select(-grp)
# Source: local data frame [2 x 5]
#      Loc   LocA Value1 Value2 Value3
# *  <chr>  <chr>  <chr>  <chr>  <chr>
# 1 Place1 Place2      6      7      8
# 2 Place3 Place4      9     10     11

答案 1 :(得分:1)

这是另一种方法。我承认这个不会像上面的r2evans一样好。

df <- structure(list(Key = c("Loc", "Value1", "Value2", "Loc", "Value3", 
"Loc", "Value1", "Value2", "Loc", "Value3"), Data = c("Place1", 
"6", "7", "Place2", "8", "Place3", "9", "10", "Place4", "11")), .Names = c("Key", 
"Data"), row.names = c(NA, -10L), class = "data.frame")

library(dplyr)
library(tidry)

df %>% 
  mutate(gid = ceiling(row_number() / 5)) %>%
  group_by(gid) %>%
  summarize(concatenated_text = str_c(Data, collapse = ",")) %>%
  separate(concatenated_text, into = c("LocA", "Value1", "Value2", "Loc", "Value3"), sep=",")