我正在处理一个遵循以下模式的数据框:
PowerItemPickupRequirement
这是一个存在模式的粗略数据集 - 在此示例中,Key Data
Loc Place1
Value1 6
Value2 7
Loc Place2
Value3 8
Loc Place3
Value1 9
Value2 10
Loc Place4
Value3 11
中的行将标识观察的第一个位置。我的目标是将这些职位的关键字调整为不同,例如sep(1,100,by=5)
而不是LocA
,以便Loc
为我提供可用于进一步分析的独特观察结果:< / p>
spread(key,value)
我一直在使用LocA Value1 Value2 Loc Value3
Place1 6 7 Place2 8
Place3 9 10 Place4 11
和一系列其他变异并选择到达这一点,所以我希望留在链中。我可以看到如何通过链外的适当子集来实现它,但我很难绕着dplyr
解决方案。
答案 0 :(得分:1)
您的数据:
df <- structure(list(Key = c("Loc", "Value1", "Value2", "Loc", "Value3",
"Loc", "Value1", "Value2", "Loc", "Value3"), Data = c("Place1",
"6", "7", "Place2", "8", "Place3", "9", "10", "Place4", "11")), .Names = c("Key",
"Data"), row.names = c(NA, -10L), class = "data.frame")
这可行吗?
library(dplyr)
library(tidyr)
df %>%
mutate(grp = (row_number() - 1) %/% 5) %>%
group_by(grp) %>%
mutate(
Key = ifelse(! duplicated(Key), Key, paste0(Key, "A"))
) %>%
ungroup() %>%
spread(Key, Data) %>%
select(-grp)
# Source: local data frame [2 x 5]
# Loc LocA Value1 Value2 Value3
# * <chr> <chr> <chr> <chr> <chr>
# 1 Place1 Place2 6 7 8
# 2 Place3 Place4 9 10 11
答案 1 :(得分:1)
这是另一种方法。我承认这个不会像上面的r2evans一样好。
df <- structure(list(Key = c("Loc", "Value1", "Value2", "Loc", "Value3",
"Loc", "Value1", "Value2", "Loc", "Value3"), Data = c("Place1",
"6", "7", "Place2", "8", "Place3", "9", "10", "Place4", "11")), .Names = c("Key",
"Data"), row.names = c(NA, -10L), class = "data.frame")
library(dplyr)
library(tidry)
df %>%
mutate(gid = ceiling(row_number() / 5)) %>%
group_by(gid) %>%
summarize(concatenated_text = str_c(Data, collapse = ",")) %>%
separate(concatenated_text, into = c("LocA", "Value1", "Value2", "Loc", "Value3"), sep=",")