我有一个包含字符串的列,我想要提取并创建一个新列。我想从第一列中提取o3,no2,nox,pm10,pm25和粗略。另外,我想从同一列中提取倒数第二个数字。我希望拥有的内容显示在示例数据
中的列滞后和轮询下structure(list(pollutant = structure(c(4L, 2L, 3L, 5L, 6L, 1L,
5L), .Label = c("Lag(coarse10, 6)", "Lag(no210, 0)", "Lag(nox10, 0)",
"Lag(o3T10, 0)", "Lag(pm1010, 1)", "Lag(pm2510, 4)"), class = "factor"),
Estimate = c(0.0043156, -0.0049645, -0.0010619, -0.0070243,
-0.009382, -0.0017919, -0.0070243), lag = c(0L, 0L, 0L, 1L,
4L, 6L, 1L), pollut = structure(c(4L, 2L, 3L, 5L, 6L, 1L,
5L), .Label = c("coarse", "no2", "nox", "o3", "pm10", "pm25"
), class = "factor")), .Names = c("pollutant", "Estimate",
"lag", "pollut"), row.names = c(NA, -7L), class = "data.frame")
答案 0 :(得分:1)
您可以使用正则表达式(dat
是数据框的名称):
transform(dat, lag = gsub(".* (.)\\)", "\\1", pollutant),
pollut = gsub(".*\\(([a-z0-9]+).*10\\,.*", "\\1", pollutant))
# pollutant Estimate lag pollut
# 1 Lag(o3T10, 0) 0.0043156 0 o3
# 2 Lag(no210, 0) -0.0049645 0 no2
# 3 Lag(nox10, 0) -0.0010619 0 nox
# 4 Lag(pm1010, 1) -0.0070243 1 pm10
# 5 Lag(pm2510, 4) -0.0093820 4 pm25
# 6 Lag(coarse10, 6) -0.0017919 6 coarse
# 7 Lag(pm1010, 1) -0.0070243 1 pm10