我通过亚马逊的机械土耳其人收集的数据有一个名为" LifeTimeApprovalRate"的列向量。该列包含信息
head(ES$LifetimeApprovalRate)
[1] [1] "100% (32/32)" "50% (16/32)" "100% (11/11)" "100% (4/4)"`
我想使用这些信息创建三个新变量:
ES$rate: "100%" "50%" "100%" "100%"
ES$approve: "32" "16" "11" "4"
ES$total: "32" "32" "11" "4"
我担心我尝试的任何事情都会创建这些难以管理的任何有用的怪物列表。
答案 0 :(得分:4)
您可以尝试strsplit
nm1 <- c('rate', 'approve', 'total')
ES[nm1] <- do.call(rbind,
strsplit(as.character(ES$LifetimeApprovalRate),'[()/ ]+'))
ES[nm1[-1]] <- lapply(ES[nm1[-1]], as.numeric)
ES
# LifetimeApprovalRate rate approve total
#1 100% (32/32) 100% 32 32
#2 50% (16/32) 50% 16 32
#3 100% (11/11) 100% 11 11
#4 100% (4/4) 100% 4 4
使用devel
版本的data.table(即v1.9.5
)的类似选项如下。安装devel版本的说明是here
。在这里,我们使用tstrsplit
来拆分列&#39; LifetimeApprovalRate&#39;并将输出列分配给新列(&#39; nm1&#39;)。还有选项type.convert=TRUE
来转换列类。
library(data.table)#v1.9.5+
setDT(ES)[, (nm1):=tstrsplit(LifetimeApprovalRate,'[()/ ]+', type.convert=TRUE)]
# LifetimeApprovalRate rate approve total
#1: 100% (32/32) 100% 32 32
#2: 50% (16/32) 50% 16 32
#3: 100% (11/11) 100% 11 11
#4: 100% (4/4) 100% 4 4
ES <- structure(list(LifetimeApprovalRate = structure(c(2L, 4L, 1L,
3L), .Label = c("100% (11/11)", "100% (32/32)", "100% (4/4)",
"50% (16/32)"), class = "factor")), .Names = "LifetimeApprovalRate",
row.names = c(NA, -4L), class = "data.frame")
答案 1 :(得分:4)
tidyr &#39; s library(tidyr)
> dat <- data.frame(x = 1:4,y = c("100% (32/32)", "50% (16/32)", "100% (11/11)", "100% (4/4)"))
> separate(dat,y,c("rate","approve","total"),sep = "[()/ ]+",extra = "drop")
x rate approve total
1 1 100% 32 32
2 2 50% 16 32
3 3 100% 11 11
4 4 100% 4 4
对于这类事情也很方便:
elasticsearch/elasticsearch-mapper-attachments/2.6.0