将数据框中的箭头分隔值分开,使用R分隔不相等的列?

时间:2014-10-15 06:20:56

标签: r csv strsplit

我有一个包含以下样本值的数据框。

[1] "entry.cei"                                                                               
[2] "entry.lifecycle->hist.open.personal demand chequing account->exit.lifecycle->entry.cei"  
[3] "entry.lifecycle->hist.open.personal demand savings account->exit.lifecycle->entry.cei"   
[4] "entry.transaction->txn.no source available->exit.transaction->entry.cei"                 
[5] "entry.branch->exit.branch->entry.transaction->txn.in-branch->exit.transaction->entry.cei"

我需要将它们拆分为" - >"将它们放在不同的列中,比如V1,V2等。 例如:

           V1                             V2               V3             V4           V5     V6    V7
1   entry.cei   
2   entry.lifecycle hist.open.personal demand chequing account  exit.lifecycle  entry.cei   
3   entry.lifecycle hist.open.personal demand savings account   exit.lifecycle  entry.cei   

我怎样才能在R中实现这一目标? 我试图将rbind与strsplit()一起使用,但我认为它需要相同数量的列。

1 个答案:

答案 0 :(得分:1)

最简单的方法是使用gsub用逗号替换->,然后使用read.csv。如果您在数据中有逗号,那么只需使用>而不是逗号,它应该没问题。

read.csv(text = gsub("->", ",", x, fixed = TRUE), header = FALSE)
#                  V1                                         V2                V3            V4               V5        V6
# 1         entry.cei                                                                                                      
# 2   entry.lifecycle hist.open.personal demand chequing account    exit.lifecycle     entry.cei                           
# 3   entry.lifecycle  hist.open.personal demand savings account    exit.lifecycle     entry.cei                           
# 4 entry.transaction                    txn.no source available  exit.transaction     entry.cei                           
# 5      entry.branch                                exit.branch entry.transaction txn.in-branch exit.transaction entry.cei

或者

read.table(text = gsub("->", ",", x, fixed = TRUE), sep = ",", fill = TRUE)

只要先使所有列表元素的长度相同,您仍然可以使用rbindstrsplitlength<-替换功能可以帮助解决这个问题。

s <- strsplit(x, "->", fixed = TRUE)
data.frame(do.call(rbind, lapply(s, `length<-`, max(sapply(s, length)))))
#                  X1                                         X2                X3            X4               X5        X6
# 1         entry.cei                                       <NA>              <NA>          <NA>             <NA>      <NA>
# 2   entry.lifecycle hist.open.personal demand chequing account    exit.lifecycle     entry.cei             <NA>      <NA>
# 3   entry.lifecycle  hist.open.personal demand savings account    exit.lifecycle     entry.cei             <NA>      <NA>
# 4 entry.transaction                    txn.no source available  exit.transaction     entry.cei             <NA>      <NA>
# 5      entry.branch                                exit.branch entry.transaction txn.in-branch exit.transaction entry.cei

原始x向量是

x <- c("entry.cei", 
 "entry.lifecycle->hist.open.personal demand chequing account->exit.lifecycle->entry.cei", 
 "entry.lifecycle->hist.open.personal demand savings account->exit.lifecycle->entry.cei", 
 "entry.transaction->txn.no source available->exit.transaction->entry.cei", 
 "entry.branch->exit.branch->entry.transaction->txn.in-branch->exit.transaction->entry.cei")