R - 在数据帧列表中拆分字符串

时间:2017-08-14 12:26:03

标签: r list dataframe split

之前我从未使用过R中的数据帧列表。也许它甚至不复杂,但我现在无法帮助自己。

所以我得到了一个数据帧列表

df1 <- data.frame(v5 = c(0.5,0.6,0.7,0.96),v6 = c("Tiny|Marsian|Worker", "Tiny|Human|Student", "Tiny|Goblin|Soldier", "Tiny|Horse|Guardian"))
df2 <- data.frame(v5 = c(0.56,0.32,0.55),v6 = c("Tiny|Human|Worker", "Tiny|Marsian|Student", "Tiny|Goblin|Soldier"))

ldf <- list(df1,df2)

每个数据帧包含6列(在这种情况下只有2列),并且每个df的行数不同。 第V6列包含三个不同的信息,每个信息由一个&#34;管道&#34; | 我现在需要做的是通过&#34; pipe&#34;分割这些信息。并从中制作三个单独的列。因为我会从

中获得一个df
library(stringr)
split = str_split_fixed(string = df1$v6, pattern = "\\|", n = 3)

之后,我想将第2列中的信息追加到ldf的各个数据帧

最后,我希望我的数据框看起来像这样

    df1 <- data.frame(v5 = c(0.5,0.6,0.7,0.96),
v6 = c("Tiny|Marsian|Worker", "Tiny|Human|Student", "Tiny|Goblin|Soldier", "Tiny|Horse|Guardian"), 
v7=c("Marsian","Human","Goblin","Horse"))
    df2 <- data.frame(v5 = c(0.56,0.32,0.55),
v6 = c("Tiny|Human|Worker", "Tiny|Marsian|Student", "Tiny|Goblin|Soldier", 
v7 = c("Human", "Marsian", "Goblin")))

我如何实现这一目标?

我已经尝试了几件事
x <- lapply(ldf, `[`, 6)

但使用splitfuctions时会出现问题! 请帮帮我

2 个答案:

答案 0 :(得分:0)

dplyrpurrr

library('dplyr')
library('purrr')
ldf2 <- map(ldf, mutate, v7 = str_split_fixed(string = v6, pattern = "\\|", n = 3)[, 2])

ldf2

[[1]]
   v5                  v6      v7
1 0.5 Tiny|Marsian|Worker Marsian
2 0.6  Tiny|Human|Student   Human
3 0.7 Tiny|Goblin|Soldier  Goblin

[[2]]
    v5                   v6      v7
1 0.56    Tiny|Human|Worker   Human
2 0.32 Tiny|Marsian|Student Marsian
3 0.55  Tiny|Goblin|Soldier  Goblin

mutate()根据字符串拆分向data.frame添加新列,map()将此mutate()应用于ldf的每个元素。

修改

如果你想要三个不同的列,你应该使用:

ldf2 <- map(ldf, separate, col = 'v6', into = c('Col1', 'Col2', 'Col3'), sep = '\\|')

答案 1 :(得分:0)

您可以执行lapplytidy::separatedo.call个功能:

combinedDF = do.call(rbind,lapply(ldf,function(x) { 

x %>% 
tidyr::separate(v6,c("v70","v7","v72"), sep = "\\|", remove=FALSE) %>%
dplyr::select(-c(v70,v72))

}))

没有lapply/rbind(感谢@Sotos)

bind_rows(ldf) %>% 
tidyr::separate(v6,c("v70","v7","v72"), sep = "\\|", remove=FALSE) %>% 
select(-c(v70, v72))


combinedDF
#    v5                   v6      v7
#1 0.50  Tiny|Marsian|Worker Marsian
#2 0.60   Tiny|Human|Student   Human
#3 0.70  Tiny|Goblin|Soldier  Goblin
#4 0.56    Tiny|Human|Worker   Human
#5 0.32 Tiny|Marsian|Student Marsian
#6 0.55  Tiny|Goblin|Soldier  Goblin