我有一个带有文本列的数据框,我想将其拆分成多列,因为文本字符串包含多个变量,例如位置,教育,距离等。
数据帧:
text.string = c("&location=NY&distance=30&education=University",
"&location=CA&distance=30&education=Highschool&education=University",
"&location=MN&distance=10&industry=Healthcare",
"&location=VT&distance=30&education=University&industry=IT&industry=Business")
df = data.frame(text.string)
df
text.string
1 &location=NY&distance=30&education=University
2 &location=CA&distance=30&education=Highschool&education=University
3 &location=MN&distance=10&industry=Healthcare
4 &location=VT&distance=30&education=University&industry=IT&industry=Business
我可以使用cSplit
:cSplit(df, 'text.string', sep = "&")
:
text.string_1 text.string_2 text.string_3 text.string_4 text.string_5 text.string_6
1: NA location=NY distance=30 education=University NA NA
2: NA location=CA distance=30 education=Highschool education=University NA
3: NA location=MN distance=10 industry=Healthcare NA NA
4: NA location=VT distance=30 education=University industry=IT industry=Business
问题是文本字符串可能包含相同变量的倍数,或者某些变量缺少某些变量。使用cSplit
,每列的变量分组变得混杂起来。我想避免这种情况,并将它们组合在一起。
所以它与此类似(education
和industry
不再出现在多个列中):
text.string_1 text.string_2 text.string_3 text.string_4 text.string_5 text.string_6
1 NA location=NY distance=30 education=University <NA> NA
2 NA location=CA distance=30 education=Highschool education=University <NA> NA
3 NA location=MN distance=10 <NA> industry=Healthcare NA
4 NA location=VT distance=30 education=University industry=IT industry=Business NA
答案 0 :(得分:1)
考虑到@NicE评论: 这是一种方式,遵循您的示例:
library(data.table)
text.string = c("&location=NY&distance=30&education=University",
"&location=CA&distance=30&education=Highschool&education=University",
"&location=MN&distance=10&industry=Healthcare",
"&location=VT&distance=30&education=University&industry=IT&industry=Business")
clean <- strsplit(text.string, "&|=")
out <- lapply(clean, function(x){ma <- data.table(matrix(x[!x==""], nrow = 2, byrow = F ));
setnames(ma, as.character(ma[1,]));
ma[-1,]})
out <- rbindlist(out, fill = T)
out
location distance education education industry industry
1: NY 30 University NA NA NA
2: CA 30 Highschool University NA NA
3: MN 10 NA NA Healthcare NA
4: VT 30 University NA IT Business