解压缩并重新排列数据

时间:2019-07-19 16:43:37

标签: r data.table

我正在学习arule,我需要将当前数据转换为as.matrix

我试图将项目分开,然后显示0和1

library(data.table)
DT <- data.table(ID=c("dog","dog","dog","cat","cat","bird"),
                 place=c("F-A-C","A-B-E","H-A","A-I-C-D","B-A","D-K-H-F"),
                 stringsAsFactors = FALSE)

我找到了这种方法,但不是我想要的结果

library(stringr)
library(plyr)
DTa <- str_split(DT$place, "-")
DTa <- ldply(DTa ,rbind)
DT <- cbind(DT$ID, DTa)

output:
DT$ID     1   2   3   4
1   dog   F   A   C   NA
2   dog   A   B   E   NA
3   dog   H   A   NA  NA
4   cat   A   I   C   D
5   cat   B   A   NA  NA
6   bird  D   K   H   F

我希望结果是这样

    DT$ID     A  B  C  D  E  F  G  H  I ..... K
1    dog      1  1  1  0  1  1  0  1  0 ..... 0
2    cat      1  1  1  1  0  0  0  0  1 ..... 0
3    bird     0  0  0  1  0  1  0  1  0 ..... 1

在原始数据中,可能有A-I,A-Z或A-Q,不确定多少

并且ID不确定会有多少。

所以我不能通过它设置长度。

 str_split_fixed(DT$place, "-", 11)

我应该怎么做或找出我想做什么关键词?

谢谢

2 个答案:

答案 0 :(得分:3)

一种更简单的方法是使用cSplit中的splitstackshape拆分为'long'格式,然后在指定{{1 }}到基于dcast

的逻辑条件
fun.aggregate

或按照@Frank的建议

length

或在library(splitstackshape) library(data.table) dcast(cSplit(DT, "place", "-", 'long'), ID ~ place, function(x) as.integer(length(x) > 0)) 中,用dcast(unique(cSplit(DT, "place", "-", 'long'))[, v := 1], ID ~ place, fill=0) 拆分列,获取tidyverse行,创建1列和separate_rows为'宽'格式

distinct

或者在spread中,可以通过将“位置”列拆分为library(dplyr) library(tidyr) DT %>% separate_rows(place) %>% distinct(ID, place) %>% mutate(n = 1) %>% spread(place, n, fill = 0) base R中的一个,获得{{1}中的list } ed vector

table

答案 1 :(得分:2)

data.table的解决方案:

dcast(DT[, unlist(lapply(.SD, strsplit, "-")), "ID"], ID ~ V1, value.var = "V1", fun.aggregate = length)
#      ID A B C D E F H I K
# 1: bird 0 0 0 1 0 1 1 0 1
# 2:  cat 2 1 1 1 0 0 0 1 0
# 3:  dog 3 1 1 0 1 1 1 0 0

这提供了“长度”而不是“是/否”。使其达到该级别:

dcast(DT[, unlist(lapply(.SD, strsplit, "-")), "ID"], ID ~ V1, value.var = "V1", fun.aggregate = length)[, lapply(.SD, min, 1), by = "ID"]
#      ID A B C D E F H I K
# 1: bird 0 0 0 1 0 1 1 0 1
# 2:  cat 1 1 1 1 0 0 0 1 0
# 3:  dog 1 1 1 0 1 1 1 0 0

我发现使用magrittr的管道更容易看到它:

library(magrittr)
DT[, unlist(lapply(.SD, strsplit, "-")), "ID"] %>%
  dcast(ID ~ V1, value.var = "V1", fun.aggregate = length) %>%
  .[, lapply(.SD, min, 1), by = "ID"]