Question

我有一个如下所示的数据框：

   ID Time           Item
1 S001   P1           1/2/
2 S002   P1       2/10/7/9
3 S003   P1 1/2/4/5/6/10/9
4 S004   P1 1/2/5/6/10/7/9
5 S005   P1     1/2/10/7/9
6 S006   P1      2/5/6/7/9

我想搜索＆＃39;项目＆＃39;列，并创建一个新列，如果Item列包含1，则新列= 1，如果item列不包含1，则新列= 0。这与grepl函数类似，但我希望它输入1和0而不是TRUE和FALSE。

也就是说，我的数据集如下所示：

ID Time           Item         Item1
    1 S001   P1           1/2/    1
    2 S002   P1       2/10/7/9    0
    3 S003   P1 1/2/4/5/6/10/9    1
    4 S004   P1 1/2/5/6/10/7/9    1
    5 S005   P1     1/2/10/7/9    1
    6 S006   P1      2/5/6/7/9    0

我想一直这样做到十列（想法是将＆＃39; Item＆＃39;列变成1和0的矩阵）。

       ID Time          Item Item1 Item2 Item3 Item4 Item5 Item6 Item7 
   1 S001   P1           1/2/    1   1    0     0     0     0     0    

   2 S002   P1       2/10/7/9    0   1    0     0     0     0     1     

   3 S003   P1 1/2/4/5/6/10/9    1   1    0     1     1     1     0

Answer 1

直接解决方案只是使用str_detect或等效的grepl（不熟悉），然后使用as.numeric将TRUE转换为1， FALSE 0。(?<!\\d)1(?!\\d)。编辑：添加一些外观，使正则表达式更健壮。 1现在检查以确保library(tidyverse) tbl <- read_table2( " ID Time Item S001 P1 1/2/ S002 P1 2/10/7/9 S003 P1 1/2/4/5/6/10/9 S004 P1 1/2/5/6/10/7/9 S005 P1 1/2/10/7/9 S006 P1 2/5/6/7/9" ) tbl %>% mutate( Item1 = as.integer(str_detect(Item, "(?<!\\d)1(?!\\d)")) ) # A tibble: 6 x 4 ID Time Item Item1 <chr> <chr> <chr> <int> 1 S001 P1 1/2/ 1 2 S002 P1 2/10/7/9 1 3 S003 P1 1/2/4/5/6/10/9 1 4 S004 P1 1/2/5/6/10/7/9 1 5 S005 P1 1/2/10/7/9 1 6 S006 P1 2/5/6/7/9 0之前或之后没有数字。但是，我认为下面的第二种方法更安全。

tidyverse

但是，您也可以使用其他separate_rows工具进行所需的结束转换。我们在这里：

使用Item将每个/放置在自己的行
使用filter，

mutate

使用spread，
NA行返回以生成您想要的值网格
将spread值替换为0。

Item基本上将presence的值转换为列标题，然后将NA的值作为值放在这些新列中，将tbl %>% separate_rows(Item, sep = "/") %>% filter(Item != "") %>% mutate(present = 1) %>% spread(Item, present, sep = "") %>% mutate_all(function(x) replace(x, is.na(x), 0)) # A tibble: 6 x 10 ID Time Item1 Item10 Item2 Item4 Item5 Item6 Item7 Item9 <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 S001 P1 1.00 0 1.00 0 0 0 0 0 2 S002 P1 0 1.00 1.00 0 0 0 1.00 1.00 3 S003 P1 1.00 1.00 1.00 1.00 1.00 1.00 0 1.00 4 S004 P1 1.00 1.00 1.00 0 1.00 1.00 1.00 1.00 5 S005 P1 1.00 1.00 1.00 0 0 0 1.00 1.00 6 S006 P1 0 0 1.00 0 1.00 1.00 1.00 1.00留在空白处

total duration in seconds: 824.044000
total duration corrected in seconds: 824

Answer 2

使用dplyr和tidyr的解决方案。

library(dplyr)
library(tidyr)

dat2 <- dat %>%
  separate_rows(Item, convert = TRUE) %>%
  mutate(Value = 1L) %>%
  complete(ID, Time, Item = 1:10, fill = list(Value = 0L)) %>%
  mutate(Item = paste0("Item", Item)) %>%
  spread(Item, Value) %>%
  select(ID, Time, paste0("Item", 1:10))
dat2
# # A tibble: 6 x 12
#   ID    Time  Item1 Item2 Item3 Item4 Item5 Item6 Item7 Item8 Item9 Item10
#   <chr> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int>  <int>
# 1 S001  P1        1     1     0     0     0     0     0     0     0      0
# 2 S002  P1        0     1     0     0     0     0     1     0     1      1
# 3 S003  P1        1     1     0     1     1     1     0     0     1      1
# 4 S004  P1        1     1     0     0     1     1     1     0     1      1
# 5 S005  P1        1     1     0     0     0     0     1     0     1      1
# 6 S006  P1        0     1     0     0     1     1     1     0     1      0

数据

dat <- read.table(text = " ID Time Item 1 S001 P1 '1/2' 2 S002 P1 '2/10/7/9' 3 S003 P1 '1/2/4/5/6/10/9' 4 S004 P1 '1/2/5/6/10/7/9' 5 S005 P1 '1/2/10/7/9' 6 S006 P1 '2/5/6/7/9'", header = TRUE, stringsAsFactors = FALSE)

如何添加标识另一列是否包含值的列

2 个答案: