我有:
id choice
----------
1 "a,b,c"
2 "c"
3 "a,c"
4 "b,c"
我需è¦
id a b c
---------
1 1 1 1
2 0 0 1
3 0 0 1
4 0 1 1
(或ç‰æ•ˆçš„TRUE / FALSE值)
在Rä¸æœ‰æ²¡æœ‰åŠžæ³•åšåˆ°è¿™ä¸€ç‚¹ï¼Ÿæˆ‘调查了strsplit
,但似乎没有帮助。
ç”案 0 :(得分:6)
è¿™æ£æ˜¯æˆ‘的“splitstackshapeâ€åŒ…ä¸çš„cSplit_e
所设计的。
library(splitstackshape)
cSplit_e(DF, "choice", sep = ",", mode = "binary",
type = "character", fill = 0, drop = TRUE)
# id choice_a choice_b choice_c
# 1 1 1 1 1
# 2 2 0 0 1
# 3 3 1 0 1
# 4 4 0 1 1
这使用æ¥è‡ª@ G.Grothendieckç”案的DF
作为输入:
Lines <- 'id choice
----------
1 "a,b,c"
2 "c"
3 "a,c"
4 "b,c"'
DF <- read.table(text = Lines, header = TRUE, comment = "-", as.is = TRUE)
ç”案 1 :(得分:0)
试试这个:
txt = 'id choice
----------
1 "a,b,c"
2 "c"
3 "a,c"
4 "b,c"'
library(dplyr)
txt %>% textConnection %>%
read.table(skip = 2, stringsAsFactors = FALSE) %>%
select(V2) %>% unlist %>%
strsplit("[,]") %>%
lapply(function(x) data.frame(t(table(c(x, "a", "b", "c"))>1))) %>%
rbind_all
然åŽä½ 会得到
Source: local data frame [4 x 3]
a b c
1 TRUE TRUE TRUE
2 FALSE FALSE TRUE
3 TRUE FALSE TRUE
4 FALSE TRUE TRUE
ç”案 2 :(得分:0)
使用strsplit
分割choice
创建s
并将其DF$id
作为å称。从s
æå–所有级别的å‘é‡all_lev
。然åŽsapply
一个s
以上的函数,它会从s
çš„æ¯ä¸ªç»„ä»¶åˆ›å»ºä¸€ä¸ªå› å,并在其上è¿è¡Œtable
。最åŽè°ƒæ¢é‚£ä¸ªã€‚
s <- setNames( strsplit(DF$choice, ","), DF$id )
all_lev <- sort(unique(unlist(s)))
m <- t(sapply(s, function(x) table(factor(x, lev = all_lev))))
这给出了以下矩阵,其ä¸è¡Œå称是id&#39; s:
> m
a b c
1 1 1 1
2 0 0 1
3 1 0 1
4 0 1 1
如果您更喜欢数æ®æ¡†ï¼Œè¯·ä½¿ç”¨ä¸Šé¢çš„m
:
data.frame(id = rownames(m), m)
注1:如果我们知é“级别始终为"a"
,"b"
和"c"
,那么我们å¯ä»¥ç¡¬ç¼–ç all_lev
缩çŸå®ƒåˆ°ï¼š
s <- setNames( strsplit(DF$choice, ","), DF$id )
m <- t(sapply(s, function(x) table(factor(x, lev = c("a", "b", "c")))))
注2:我们å‡è®¾DF
æ˜¯è¿™æ ·çš„ï¼š
Lines <- 'id choice
----------
1 "a,b,c"
2 "c"
3 "a,c"
4 "b,c"'
DF <- read.table(text = Lines, header = TRUE, comment = "-", as.is = TRUE)
更新缩çŸå›žç”。
ç”案 3 :(得分:0)
è¿™å‡è®¾åƒ@kohskeé‚£æ ·ï¼Œæ‚¨çš„æ•°æ®å®žé™…上就åƒæ‚¨æä¾›çš„é‚£æ ·ã€‚å¦‚æžœä¸æ˜¯ï¼Œè¯·åœ¨å°†æ¥ä½¿ç”¨dput
æ¥å…±äº«æ•°æ®ï¼š
txt = 'id choice
----------
1 "a,b,c"
2 "c"
3 "a,c"
4 "b,c"'
dat <- setNames(read.table(text=txt, skip = 2, stringsAsFactors = FALSE),
strsplit(strsplit(txt, "\n")[[1]][1], "\\s+")[[1]]
)
library(qdapTools)
matrix2df(mtabulate(unlist(lapply(split(dat[[2]], dat[[1]]),
strsplit, ",\\s*"), recursive=FALSE)), "id")
æˆ‘è®¨åŽŒåµŒå¥—è°ƒç”¨ï¼Œå› ä¸ºæˆ‘ç†Ÿæ‚‰magrittr
的管é“%>%
,所以这里使用的是管é“:
library(magrittr)
txt %>% read.table(text=., skip = 2, stringsAsFactors = FALSE) %>%
setNames(strsplit(strsplit(txt, "\n")[[1]][1], "\\s+")[[1]]) %>%
with(split(choice, id)) %>%
lapply(strsplit, ",\\s*") %>%
unlist(recursive=FALSE) %>%
mtabulate %>%
matrix2df("id")
## id a b c
## 1 1 1 1 1
## 2 2 0 0 1
## 3 3 1 0 1
## 4 4 0 1 1