Question

我正在尝试在我的数据集中创建一个新列。到目前为止，我已经将一个JSON文件导入到R中，其中包含一个充满不同单词的列（＆＃34;紫色＆＃34;，＆＃34;红色＆＃34;，＆＃34;蓝色＆＃34;等）观察有一些这些词的组合。我的目标是创建一个新列，其中包含单词的标题（＆＃34; purple＆＃34;，＆＃34; red＆＃34;，＆＃34; blue＆＃34;等）。我希望该列具有True或Falses，具体取决于观察结果是否显示该颜色。我尝试使用子集函数以及手动执行此操作，但是有超过300种不同的观察结果，这使得非常不方便。我非常感谢任何帮助！

例如：

Observations     Color
1                Blue
2                Red, Blue
3                Blue, Green
4                Purple
5                Yellow, Orange

现在我想要

Observations     Color       Red       Yellow        Orange    Blue
1                Blue        False     False         False     True
2                Red, Blue   True      False         False     True

等

这是我在这个网站上的第一个问题所以如果有任何问题我会道歉。

Answer 1

您可以简单地遍历要创建的列名称，并使用grepl查找Color列中是否存在这些名称：

dat <- read.table(text="Observations     Color
1                Blue
                  2                Red,Blue
                  3                Blue,Green
                  4                Purple
                  5                Yellow,Orange", header=T, stringsAsFactors=F)
# I removed the space after the commas to facilitate the data.frame creation.

cols <- c("Red", "Yellow", "Orange", "Blue")

for (i in cols) dat[[i]] <- grepl(i, dat$Color)

结果：

> dat
  Observations         Color   Red Yellow Orange  Blue
1            1          Blue FALSE  FALSE  FALSE  TRUE
2            2      Red,Blue  TRUE  FALSE  FALSE  TRUE
3            3    Blue,Green FALSE  FALSE  FALSE  TRUE
4            4        Purple FALSE  FALSE  FALSE FALSE
5            5 Yellow,Orange FALSE   TRUE   TRUE FALSE

编辑：

如果你想要所有颜色的列，创建矢量的更好方法是罗伯特在评论中提出的：

cols <- unique(unlist(strsplit(dat$Color, ",")))
#You might have to change from "," to ", " if you have white spaces after the commas
#or even ",\\s?" if they aren't always there.

新结果将是：

  Observations         Color   Red Yellow Orange  Blue Green Purple
1            1          Blue FALSE  FALSE  FALSE  TRUE FALSE  FALSE
2            2      Red,Blue  TRUE  FALSE  FALSE  TRUE FALSE  FALSE
3            3    Blue,Green FALSE  FALSE  FALSE  TRUE  TRUE  FALSE
4            4        Purple FALSE  FALSE  FALSE FALSE FALSE   TRUE
5            5 Yellow,Orange FALSE   TRUE   TRUE FALSE FALSE  FALSE

Answer 2

尝试这样的事情：

example <- data.frame(colors=c("A,B", "A", "B", "F", "C", "C,G", "C", "D", "E", "F"),stringsAsFactors = F)
cols <- sort(unique(unlist(strsplit(example$colors, ",", fixed = TRUE))))
dummies= sapply(cols,function(co)grepl(co, example$colors))

          A     B     C     D     E     F     G
 [1,]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
 [2,]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
 [3,] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
 [4,] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
 [5,] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
 [6,] FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
 [7,] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
 [8,] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
 [9,] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
[10,] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE

Answer 3

基础R的解释比使用dplyr更简单，但感兴趣的是dplyr解决方案：

cols <- unique(unlist(strsplit(dat$Color, ",", fixed = TRUE)))
dat %>% mutate_(.dots = sapply(cols, function(col) interp(~grepl(col, Color), col = col)))

这是使用plyr和magrittr的方式：

cols %>% 
  laply(grepl, dat$Color) %>%
  t %>%
  data.frame %>%
  setNames(cols) %>%
  cbind(dat, .)

另一个：

dat %>% adply(1, . %$%
                Color %>%
                strsplit(",") %>%
                extract2(1) %>%
                factor(levels = cols) %>%
                table %>%
                is_greater_than(0))

这利用了magrittr允许您创建匿名函数链的事实。

在R中创建一个新的真/假列

3 个答案:

编辑：