从列中提取字符并创建新变量

时间:2013-11-18 15:46:33

标签: regex string r

我有一个包含字符串的列,我想要提取并创建一个新列。我想从第一列中提取o3,no2,nox,pm10,pm25和粗略。另外,我想从同一列中提取倒数第​​二个数字。我希望拥有的内容显示在示例数据

中的列滞后和轮询下
structure(list(pollutant = structure(c(4L, 2L, 3L, 5L, 6L, 1L, 
5L), .Label = c("Lag(coarse10, 6)", "Lag(no210, 0)", "Lag(nox10, 0)", 
"Lag(o3T10, 0)", "Lag(pm1010, 1)", "Lag(pm2510, 4)"), class = "factor"), 
    Estimate = c(0.0043156, -0.0049645, -0.0010619, -0.0070243, 
    -0.009382, -0.0017919, -0.0070243), lag = c(0L, 0L, 0L, 1L, 
    4L, 6L, 1L), pollut = structure(c(4L, 2L, 3L, 5L, 6L, 1L, 
    5L), .Label = c("coarse", "no2", "nox", "o3", "pm10", "pm25"
    ), class = "factor")), .Names = c("pollutant", "Estimate", 
"lag", "pollut"), row.names = c(NA, -7L), class = "data.frame")

1 个答案:

答案 0 :(得分:1)

您可以使用正则表达式(dat是数据框的名称):

transform(dat, lag = gsub(".* (.)\\)", "\\1", pollutant),
               pollut = gsub(".*\\(([a-z0-9]+).*10\\,.*", "\\1", pollutant))

#          pollutant   Estimate lag pollut
# 1    Lag(o3T10, 0)  0.0043156   0     o3
# 2    Lag(no210, 0) -0.0049645   0    no2
# 3    Lag(nox10, 0) -0.0010619   0    nox
# 4   Lag(pm1010, 1) -0.0070243   1   pm10
# 5   Lag(pm2510, 4) -0.0093820   4   pm25
# 6 Lag(coarse10, 6) -0.0017919   6 coarse
# 7   Lag(pm1010, 1) -0.0070243   1   pm10