我有一个看起来像这样的data.frame:
df <- data.frame(col1=c("a","b","c","d"), col2=c("1","1;2;3","5","3;2;5;5;3"), col3=c("0","1;1;0","0","0;0;1;1;0"))
# col1 col2 col3
# 1 a 1 0
# 2 b 1;2;3 1;1;0
# 3 c 5 0
# 4 d 3;2;5;5;3 0;0;1;1;0
简而言之,某些行的列的值由&#34;;&#34;连接。在读取data.frame之前,我不知道哪些列将包含连接值,但我知道它们对于具有该值的所有行都是相同的。我还知道,对于具有连接值的列的行,连接值的数量在所有这些列中是相同的(第2行在col2和col3中都有3个值,第4行在这些列中有5个值)
我想创建一个新的data.frame,其中这些连接的值被拆分为单独的行。对于这些行,不应具有连接值的列中的值应按连接值的数量进行复制。
生成的data.frame将是:
df <- data.frame(col1=c("a","b","b","b","c","d","d","d","d","d"), col2=c("1","1","2","3","5","3","2","5","5","3"), col3=c("0","1","1","0","0","0","0","1","1","0"))
# col1 col2 col3
# 1 a 1 0
# 2 b 1 1
# 3 b 2 1
# 4 b 3 0
# 5 c 5 0
# 6 d 3 0
# 7 d 2 0
# 8 d 5 1
# 9 d 5 1
# 10 d 3 0
答案 0 :(得分:2)
这是一个选项
df <- data.frame(col1=c("a","b","c","d"), col2=c("1","1;2;3","5","3;2;5;5;3"), col3=c("0","1;1;0","0","0;0;1;1;0"))
df2 <- data.frame(col1=c("a","b","b","b","c","d","d","d","d","d"), col2=c("1","1","2","3","5","3","2","5","5","3"), col3=c("0","1","1","0","0","0","0","1","1","0"))
## reshape `col1` to make it look like the others
v <- Vectorize(gsub)
df$col1 <- v('\\b\\d\\b', df$col1, df$col2)
# col1 col2 col3
# 1 a 1 0
# 2 b;b;b 1;2;3 1;1;0
# 3 c 5 0
# 4 d;d;d;d;d 3;2;5;5;3 0;0;1;1;0
## split on white space or `;` and coerce back into a data frame
data.frame(do.call('cbind', lapply(df, function(x)
unlist(strsplit(as.character(x), '[\\s;]')))))
# col1 col2 col3
# 1 a 1 0
# 2 b 1 1
# 3 b 2 1
# 4 b 3 0
# 5 c 5 0
# 6 d 3 0
# 7 d 2 0
# 8 d 5 1
# 9 d 5 1
# 10 d 3 0
答案 1 :(得分:1)
这是我写的&#34; splitstackshape&#34;包裹。您可以使用cSplit
,如下所示:
library(splitstackshape)
cSplit(df, c("col2", "col3"), ";", "long")
# col1 col2 col3
# 1: a 1 0
# 2: b 1 1
# 3: b 2 1
# 4: b 3 0
# 5: c 5 0
# 6: d 3 0
# 7: d 2 0
# 8: d 5 1
# 9: d 5 1
# 10: d 3 0
答案 2 :(得分:0)
不像rawr的答案那么复杂,但也许更容易看到发生了什么
df1 <- data.frame(col1=c("a","b","c","d"),
col2=c("1","1;2;3","5","3;2;5;5;3"),
col3=c("0","1;1;0","0","0;0;1;1;0"),
stringsAsFactors=FALSE)
df1_rows <- nrow(df1)
col1_split <- strsplit(df1$col1,";")
col2_split <- strsplit(df1$col2,";")
col3_split <- strsplit(df1$col3,";")
df2 <- data.frame(col1=character(),
col2=character(),
col3=character(),
stringsAsFactors=FALSE)
for (n in 1:df1_rows){ df2 <- rbind(df2,
data.frame(col1=col1_split[[n]],
col2=col2_split[[n]],
col3=col3_split[[n]],
stringsAsFactors=FALSE))}
给出了
> df2
col1 col2 col3
1 a 1 0
2 b 1 1
3 b 2 1
4 b 3 0
5 c 5 0
6 d 3 0
7 d 2 0
8 d 5 1
9 d 5 1
10 d 3 0