R:将data.frame字符串变量解析为多个变量

时间:2015-06-24 14:32:49

标签: r parsing

我通过亚马逊的机械土耳其人收集的数据有一个名为" LifeTimeApprovalRate"的列向量。该列包含信息

head(ES$LifetimeApprovalRate)
[1] [1] "100% (32/32)" "50% (16/32)" "100% (11/11)" "100% (4/4)"`

我想使用这些信息创建三个新变量:

 ES$rate: "100%" "50%" "100%" "100%" 
 ES$approve: "32" "16" "11" "4"
 ES$total: "32" "32" "11" "4"

我担心我尝试的任何事情都会创建这些难以管理的任何有用的怪物列表。

2 个答案:

答案 0 :(得分:4)

您可以尝试strsplit

  nm1 <- c('rate', 'approve', 'total')
  ES[nm1] <- do.call(rbind,
             strsplit(as.character(ES$LifetimeApprovalRate),'[()/ ]+'))

  ES[nm1[-1]] <- lapply(ES[nm1[-1]], as.numeric) 
  ES
  #    LifetimeApprovalRate rate approve total
  #1         100% (32/32) 100%      32    32
  #2          50% (16/32)  50%      16    32
  #3         100% (11/11) 100%      11    11
  #4           100% (4/4) 100%       4     4

使用devel版本的data.table(即v1.9.5)的类似选项如下。安装devel版本的说明是here。在这里,我们使用tstrsplit来拆分列&#39; LifetimeApprovalRate&#39;并将输出列分配给新列(&#39; nm1&#39;)。还有选项type.convert=TRUE来转换列类。

 library(data.table)#v1.9.5+
 setDT(ES)[, (nm1):=tstrsplit(LifetimeApprovalRate,'[()/ ]+', type.convert=TRUE)]
 #   LifetimeApprovalRate rate approve total
 #1:         100% (32/32) 100%      32    32
 #2:          50% (16/32)  50%      16    32
 #3:         100% (11/11) 100%      11    11
 #4:           100% (4/4) 100%       4     4

数据

 ES <-  structure(list(LifetimeApprovalRate = structure(c(2L, 4L, 1L, 
 3L), .Label = c("100% (11/11)", "100% (32/32)", "100% (4/4)", 
 "50% (16/32)"), class = "factor")), .Names = "LifetimeApprovalRate",
 row.names = c(NA, -4L), class = "data.frame")

答案 1 :(得分:4)

tidyr &#39; s library(tidyr) > dat <- data.frame(x = 1:4,y = c("100% (32/32)", "50% (16/32)", "100% (11/11)", "100% (4/4)")) > separate(dat,y,c("rate","approve","total"),sep = "[()/ ]+",extra = "drop") x rate approve total 1 1 100% 32 32 2 2 50% 16 32 3 3 100% 11 11 4 4 100% 4 4 对于这类事情也很方便:

elasticsearch/elasticsearch-mapper-attachments/2.6.0