我在数据框中有一个包含多个值的列
fruits
1 apple,banana
2 banana,peaches
3 peaches
4 mango
有没有办法为水果创建一个独特价值字典 将创建一个具有值的新列水果:
fruits = apple,banana,peaches,mango
更新:我需要将值作为列而不是仅包含唯一值的列表。这样我就可以创建一个具有以下内容的最终数据框:
fruits fruit_apple fruit_banana fruit_mango fruit_peacheas
1 apple,banana 1 0 0 0
2 banana,peaches 0 1 0 1
3 peaches 0 0 0 1
4 mango 0 0 1 0
答案 0 :(得分:2)
我们可以使用cSplit_e
splitstackshape
轻松完成此操作
library(splitstackshape)
cSplit_e(df1, "fruits", ",", type = "character", fill = 0)
# fruits fruits_apple fruits_banana fruits_mango fruits_peaches
#1 apple,banana 1 1 0 0
#2 banana,peaches 0 1 0 1
#3 peaches 0 0 0 1
#4 mango 0 0 1 0
df1 <- structure(list(fruits = c("apple,banana", "banana,peaches", "peaches",
"mango")), .Names = "fruits", class = "data.frame", row.names = c("1",
"2", "3", "4"))
答案 1 :(得分:1)
您是否希望新列重复连接列表?对不起,它不是特别清楚。假设情况如此,并且您的data.frame
由字符串组成而不是因素;
df <- read.delim(
text="fruits
apple,banana
banana,peaches
peaches
mango",
sep="\n",
header=TRUE,
stringsAsFactors=FALSE)
df
#> fruits
#> 1 apple,banana
#> 2 banana,peaches
#> 3 peaches
#> 4 mango
df$uniquefruits <- paste0(unique(unlist(strsplit(df$fruits, split=","))), collapse=",")
df
#> fruits uniquefruits
#> 1 apple,banana apple,banana,peaches,mango
#> 2 banana,peaches apple,banana,peaches,mango
#> 3 peaches apple,banana,peaches,mango
#> 4 mango apple,banana,peaches,mango
或者您的意思是仅从第一个fruits
列中获取其他地方未重复的值?
更新:根据评论,我认为这就是你所追求的:
uniquefruits <- unique(unlist(strsplit(df$fruits, split=",")))
uniquefruits
#> [1] "apple" "banana" "peaches" "mango"
df2 <- cbind(df,
sapply(uniquefruits,
function(y) apply(df, 1,
function(x) as.integer(y %in% unlist(strsplit(x, split=","))))))
df2
#> fruits apple banana peaches mango
#> 1 apple,banana 1 1 0 0
#> 2 banana,peaches 0 1 1 0
#> 3 peaches 0 0 1 0
#> 4 mango 0 0 0 1
理论上,您可以使用dplyr
执行此操作,但我无法弄清楚如何自动执行rowwise
mutate
的列处理(任何人都知道如何?)< / p>
library(dplyr)
df %>% rowwise() %>% mutate(apple = as.integer("apple" %in% unlist(strsplit(fruits, ","))),
banana = as.integer("banana" %in% unlist(strsplit(fruits, ","))),
peaches = as.integer("peaches" %in% unlist(strsplit(fruits, ","))),
mango = as.integer("mango" %in% unlist(strsplit(fruits, ","))))
#> Source: local data frame [4 x 5]
#> Groups: <by row>
#>
#> # A tibble: 4 x 5
#> fruits apple banana peaches mango
#> <chr> <int> <int> <int> <int>
#> 1 apple,banana 1 1 0 0
#> 2 banana,peaches 0 1 1 0
#> 3 peaches 0 0 1 0
#> 4 mango 0 0 0 1
答案 2 :(得分:0)
以基数R:
fruits <- sort(unique(unlist(strsplit(as.character(df$fruits), split=','))))
cols <- as.data.frame(matrix(rep(0, nrow(df)*length(fruits)), ncol=length(fruits)))
names(cols) <- fruits
df <- cbind.data.frame(df, cols)
df <- as.data.frame(t(apply(df, 1, function(x){fruits <- strsplit(x['fruits'], split=','); x[unlist(fruits)] <- 1;x})))
df
fruits apple banana mango peaches
1 apple,banana 1 1 0 0
2 banana,peaches 0 1 0 1
3 peaches 0 0 0 1
4 mango 0 0 1 0
答案 3 :(得分:-1)
您可以使用以下步骤
1)使用strsplit函数以逗号分割数据帧。
2)将向量的拆分列表拆分为单个向量。
3)然后选择list.fruits字符向量的唯一。
以下是解决方案
# DataFrame of fruits
f <- c("apple,banana","banana,peaches","peaches","mango")
fruits <- as.data.frame(f)
# fruits dataframe
f
#1 apple,banana
#2 banana,peaches
#3 peaches
#4 mango
list.fruits <- unlist(strsplit(f,split=","))
unique.fruits <- unique(list.fruits)
# Result
unique.fruits
[1] "apple" "banana" "peaches" "mango"