我如何将这样的列分开,其中数据具有分隔符但其余部分没有,并且它有一些不相等的字符串?
输入: id
142 TM500A2013PISA8 / 22 / 17BG
143 TM500CAGE2012QUDO8 / 22/1720 +
输出:
类别网站园地年份种类日期部分 142 TM 500 A 2013 PISA 8/22/17 BG 143 TM 500 CAGE 2012 QUDO 8/22/17 20 +
我讨论了其他问题并尝试了一些可能有用的东西,如果它是一个相等的字符串,即:
>df <- avgmass %>% separate(id, c("site", "garden", "plot", "year",
"species", "sampledate", "portion"),sep=cumsum(c(2,3,3,4,4,5)))
但由于情节ID是A,B或CAGE;日期有&#34; /&#34; - 我不确定如何处理它。
由于我对R比较陌生,我尝试搜索有关如何使用sep参数的更多详细信息但无济于事...感谢您的帮助。
答案 0 :(得分:0)
以下代码可能对您有用,假设“site”,“garden”和“species”列具有固定宽度。
df <- df %>%
mutate(site = substr(id, 1, 2),
garden = substr(id, 3, 5),
plot = ifelse(substr(id, 6, 9) == "CAGE", substr(id, 6, 9), substr(id, 6, 6)),
year = ifelse(substr(id, 6, 9) == "CAGE", substr(id, 10, 13), substr(id, 7, 10)),
species = ifelse(substr(id, 6, 9) == "CAGE", substr(id, 14, 17), substr(id, 11, 14)),
sampledate = ifelse(substr(id, 6, 9) == "CAGE", substr(id, 18, nchar(id)), substr(id, 15, nchar(id)))) %>%
separate(sampledate, into = c("m","d","y"), sep = "/") %>%
mutate(portion = substr(y, 3, nchar(y)),
sampledate = as.Date(paste(m, d, substr(y, 1, 2), sep = "-"), format = "%m-%d-%y"),
m = NULL,
d = NULL,
y = NULL)