(R)解析字符向量并将其分为两个单独的列

时间:2018-11-19 17:05:48

标签: r regex parsing tidyr

我有一个数据框,其字符列的均值(sd)如下:

table <- tribble(
  ~var1, ~var2,
  #------------
  "27.0 (3.1)", "171.4 (9.0)",
  "27.0 (3.2)", "176.8 (7.2)",
  "27.1 (3.0)", "165.0 (6.2)"
)

我想将每一列分为两列,一列用于平均值,一列用于sd。像这样:

table_split <- tribble(
  ~var1_mean, ~var1_sd, ~var2_mean, ~var2_sd,
  #---------------------
  27.0, 3.1, 171.4, 9.0,
  27.0, 3.2, 176.8, 7.2,
  27.1, 3.0, 165.0, 6.2

)

到目前为止,我已经尝试了tidyr::separate(table, var1, c("var1_mean", "var1_sd"), sep = " \\("),因为它不能删除结尾的括号,所以只能部分起作用。

2 个答案:

答案 0 :(得分:1)

使用separate,如下所示。请注意,这需要tidyr 0.8.2或更高版本。较早的版本在NA参数中不支持into

library(dplyr)
library(tidyr)  

table %>% 
  separate(var1, into = c("mean1", "sd1", NA), sep = "[ ()]+") %>%
  separate(var2, into = c("mean2", "sd2", NA), sep = "[ ()]+")

给予:

# A tibble: 3 x 4
  mean1 sd1   mean2 sd2  
  <chr> <chr> <chr> <chr>
1 27.0  3.1   171.4 9.0  
2 27.0  3.2   176.8 7.2  
3 27.1  3.0   165.0 6.2 

答案 1 :(得分:1)

在基数R中,您可以这样做:

nms = paste0(c('mean','sd'),rep(1:2,each=ncol(table))) # Create the new names

read.table(text=gsub('[()]','',do.call(paste,table)),col.names = nms)

  mean1 sd1 mean2 sd2
1  27.0 3.1 171.4 9.0
2  27.0 3.2 176.8 7.2
3  27.1 3.0 165.0 6.2