Question

我有一个包含以下内容的R dataFrame：

column1          column2
score1...test1     10
score2...test2     11
score3...test3     15

我想将我的dataFrame重塑为以下内容：

column1          column2    score     test
score1...test1     10       score1    test1
score1...test2     11       score2    test2
score1...test3     15       score3    test3

我尝试使用

library(stringr)
temp=str_split_fixed(df, " ...", 4)

但我得到了这个

 [,1] [,2] [,3] [,4]

如何解决这个问题？

Answer 1

你可以做到

library(splitstackshape)
cSplit(df, 1, "...", drop=F)
#           column1 column2 column1_1 column1_2
# 1: score1...test1      10    score1     test1
# 2: score2...test2      11    score2     test2
# 3: score3...test3      15    score3     test3

或setnames(cSplit(df, 1, "...", drop=F), 3:4, c("score", "test"))[]您需要提供自定义名称。

Answer 2

我们可以使用base R执行此操作。使用...将,替换为sub，使用read.csv读取字符串以创建一个包含两列的data.frame，cbind包含原始数据集预期的产出。

  cbind(df,read.csv(text=sub('[[:punct:]]+', ',', 
      df$column1), header=FALSE, col.names=c('score', 'test')))
 #          column1 column2  score  test
 #1 score1...test1      10 score1 test1
 #2 score2...test2      11 score2 test2
 #3 score3...test3      15 score3 test3

或者我们可以直接使用...作为sub中的模式，替换为`，＆＃39;，其余部分在上面。

 cbind(df,read.csv(text=sub('...', ',', 
      df$column1, fixed=TRUE), header=FALSE, 
         col.names=c('score', 'test')))

如果我们需要一个包解决方案，可以使用separate中的tidyr。

library(tidyr)
separate(df, column1, into=c("score", "test"), remove=FALSE)

在R中使用特殊字符的分割功能

2 个答案: