Question

我有一个包含字符串的列，我想要提取并创建一个新列。我想从第一列中提取o3，no2，nox，pm10，pm25和粗略。另外，我想从同一列中提取倒数第二个数字。我希望拥有的内容显示在示例数据

中的列滞后和轮询下

structure(list(pollutant = structure(c(4L, 2L, 3L, 5L, 6L, 1L, 
5L), .Label = c("Lag(coarse10, 6)", "Lag(no210, 0)", "Lag(nox10, 0)", 
"Lag(o3T10, 0)", "Lag(pm1010, 1)", "Lag(pm2510, 4)"), class = "factor"), 
    Estimate = c(0.0043156, -0.0049645, -0.0010619, -0.0070243, 
    -0.009382, -0.0017919, -0.0070243), lag = c(0L, 0L, 0L, 1L, 
    4L, 6L, 1L), pollut = structure(c(4L, 2L, 3L, 5L, 6L, 1L, 
    5L), .Label = c("coarse", "no2", "nox", "o3", "pm10", "pm25"
    ), class = "factor")), .Names = c("pollutant", "Estimate", 
"lag", "pollut"), row.names = c(NA, -7L), class = "data.frame")

Answer 1

您可以使用正则表达式（dat是数据框的名称）：

transform(dat, lag = gsub(".* (.)\\)", "\\1", pollutant),
               pollut = gsub(".*\\(([a-z0-9]+).*10\\,.*", "\\1", pollutant))

#          pollutant   Estimate lag pollut
# 1    Lag(o3T10, 0)  0.0043156   0     o3
# 2    Lag(no210, 0) -0.0049645   0    no2
# 3    Lag(nox10, 0) -0.0010619   0    nox
# 4   Lag(pm1010, 1) -0.0070243   1   pm10
# 5   Lag(pm2510, 4) -0.0093820   4   pm25
# 6 Lag(coarse10, 6) -0.0017919   6 coarse
# 7   Lag(pm1010, 1) -0.0070243   1   pm10

从列中提取字符并创建新变量

1 个答案: