在r中连接字符串的数据透视表

时间:2016-12-12 01:43:18

标签: r

我有以下数据集:

mydata<- data.frame(Factors= c("a,b" , "c,d" , "a,c"), Valu = c ("2,3" , "7,8" , "9,1"))



   Factors Valu
1     a,b  2,3
2     c,d  7,8
3     a,c  9,1

我希望转换为具有以下因素的所有值的以下内容:

我的理想输出

a   b  c  d
2   2  7  7
3   3  8  8
9      9
1      1

我需要一个数据透视表。但是我需要准备数据,然后使用meltdcast得到我想要的输出:准备数据的失败尝试之一是:

mydata2 <- cSplit(mydata, c("Factors","Valu") , ",", "long")
但是他们失去了联系。

4 个答案:

答案 0 :(得分:6)

以下是包含cSplit

的单行代码
library(splitstackshape)
with(cSplit(cSplit(mydata, 1, ",", "long"), 2, ",", "long"), split(Valu, Factors))
#$a
#[1] 2 3 9 1

#$b
#[1] 2 3

#$c
#[1] 7 8 9 1

#$d
#[1] 7 8

如果我们需要data.table/data.frame,请使用dcast转换&#39; long&#39;格式为&#39; wide&#39;。

dcast(cSplit(cSplit(mydata, 1, ",", "long"), 2, ",", "long"), 
           rowid(Factors)~Factors, value.var="Valu")[, Factors := NULL][]
#   a  b c  d
#1: 2  2 7  7
#2: 3  3 8  8
#3: 9 NA 9 NA
#4: 1 NA 1 NA

注意:splitstackshape加载data.table。在这里,我们使用了data.table_1.10.0。来自dcast的{​​{1}}也非常fast

答案 1 :(得分:4)

使用几个*applystrsplitgrep

## convert columns to characters so you can use strsplit
mydata$Factors <- as.character(mydata$Factors)
mydata$Valu <- as.character(mydata$Valu)


## get all the unique factor values by splitting them 
f <- unique(unlist(strsplit(unique(mydata$Factors), split = ",")))

## filter 'mydata' by using 'grep' to search for each individual factor value
## (using sapply for one at a time)
l <- sapply(f, function(x) mydata[grep(x, mydata$Factors), "Valu"])

这给出了一个列表,其中每个元素都由'Factor'值命名,它包含与之关联的所有'Valu'值

l
# $a
# [1] "2,3" "9,1"
# 
# $b
# [1] "2,3"
# 
# $c
# [1] "7,8" "9,1"
# 
# $d
# [1] "7,8"

此列表中的另一个lapply将拆分'Valu's

result <- lapply(l, function(x) unlist(strsplit(x, split = ",")))

result
# $a
# [1] "2" "3" "9" "1"
# 
# $b
# [1] "2" "3"
# 
# $c
# [1] "7" "8" "9" "1"
# 
# $d
# [1] "7" "8"

修改

要在data.frame中获取结果,您可以使每个列表元素具有相同的长度(通过填充NA),然后在结果上调用data.frame

## the number of rows required for each column
maxLength <- max(sapply(result, length))

## append 'NA's to list with fewer than maxLenght lements
result <- data.frame(sapply(result, function(x) c(x, rep(NA, maxLength - length(x))) ))
result
#     a    b c    d
#   1 2    2 7    7
#   2 3    3 8    8
#   3 9 <NA> 9 <NA>
#   4 1 <NA> 1 <NA>

修改

在回复评论时,如果您有“相似”字符串,则可以使用grep明确显示( )正则表达式(有关说明,请参阅any regex cheatsheet

mydata<- data.frame(Factors= c("a,b" , "c,d" , "a,c", "bo,ao"), Valu = c ("2,3" , "7,8" , "9,1", "x,y"))

mydata$Factors <- as.character(mydata$Factors)
mydata$Valu <- as.character(mydata$Valu)

f <- unique(unlist(strsplit(unique(mydata$Factors), split = ",")))

## filter 'mydata' by using 'grep' to search for each individual factor value
## (using sapply for one at a time)
l <- sapply(f, function(x) mydata[grep(paste0("(",x,")"), mydata$Factors), "Valu"])

答案 2 :(得分:4)

另一个基础R尝试:

# character conversion first
mydata[] <- lapply(mydata, as.character)

long <- do.call(rbind, 
  do.call(Map, c(expand.grid, lapply(mydata, strsplit, ","), stringsAsFactors=FALSE))
)
split(long$Valu, long$Factors)

#$a
#[1] "2" "3" "9" "1"
#
#$b
#[1] "2" "3"
#
#$c
#[1] "7" "8" "9" "1"
#
#$d
#[1] "7" "8"

答案 3 :(得分:4)

我在上面的评论中误解了;如果您希望每个Factor匹配每个Valu,则需要单独分隔列以获取组合。如果你添加指数来传播,那就不错了:

library(tidyverse)

mydata %>% 
    separate_rows(Factors) %>% separate_rows(Valu, convert = TRUE) %>%
    # add indices to give row order when spreading 
    group_by(Factors) %>% mutate(i = row_number()) %>%
    spread(Factors, Valu) %>% 
    select(-i)    # clean up extra column

## # A tibble: 4 × 4
##       a     b     c     d
## * <int> <int> <int> <int>
## 1     2     2     7     7
## 2     3     3     8     8
## 3     9    NA     9    NA
## 4     1    NA     1    NA