我正在努力从字符串中组成一个有组织的数据框。
使用此输入
text = c('I do not want to do this thing anymore','you do not know what I mean','I will not do this thing','do not want anymore','you will see')
[1] "I do not want to do this thing anymore" "you do not know what I mean"
[3] "I will not do this thing" "do not want anymore"
[5] "you will see"
我希望制作一个看起来像具有序列信息的文档术语表的数据帧。但是,我不知道如何实现这一目标。这既不是文档术语矩阵,也不是可以使用以下代码创建的数据框。
as.data.frame(t(stri_list2matrix(strsplit(as.character(text),' '))))
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 I do not want to do this thing anymore
2 you do not know what I mean <NA> <NA>
3 I will not do this thing <NA> <NA> <NA>
4 do not want anymore <NA> <NA> <NA> <NA> <NA>
5 you will see <NA> <NA> <NA> <NA> <NA> <NA>
我打算做的是这个
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
1 <NA> I do <NA> not <NA> <NA> want to do this thing anymore <NA> <NA> <NA>
2 you <NA> do <NA> not <NA> know <NA> <NA> <NA> <NA> <NA> <NA> what I mean
3 <NA> I <NA> will not <NA> <NA> <NA> <NA> do this thing <NA> <NA> <NA> <NA>
4 <NA> <NA> do <NA> not <NA> <NA> want <NA> <NA> <NA> <NA> anymore <NA> <NA> <NA>
5 you <NA> <NA> will <NA> see <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
result = data.frame(V1=c(NA,"you",NA,NA,"you"),
V2=c("I",NA,"I",NA,NA),
V3=c("do","do",NA,"do",NA),
V4=c(NA,NA,"will",NA,"will"),
V5=c("not","not","not","not",NA),
V6=c(NA,NA,NA,NA,"see"),
V7=c(NA,"know",NA,NA,NA),
V8=c("want",NA,NA,"want",NA),
V9=c("to",NA,NA,NA,NA),
V10=c("do",NA,"do",NA,NA),
V11=c("this",NA,"this",NA,NA),
V12=c("thing",NA,"thing",NA,NA),
V13=c("anymore",NA,NA,"anymore",NA),
V14=c(NA,"what",NA,NA,NA),
V15=c(NA,"I",NA,NA,NA),
V16=c(NA,"mean",NA,NA,NA))
这样我就可以还原原始的字符串列表。
origin = do.call(paste, c(result, sep=" "))
origin = gsub('( NA|NA\\s*)','',origin)
origin
[1] "I do not want to do this thing anymore" "you do not know what I mean"
[3] "I will not do this thing" "do not want anymore"
[5] "you will see"
答案 0 :(得分:0)
请找到以下代码,并告诉我这是否符合您的目的,只是输出数据框中的单词顺序与您的不同
library(stringi)
text = c('I do not want to do this thing anymore','you do not know what I mean','I will not do this thing','do not want anymore','you will see')
tf = as.data.frame(t(stri_list2matrix(strsplit(as.character(text),' '))),stringsAsFactors = F)
strs = unlist(strsplit(as.character(text),' '))
fstrs = unique(strs)
fdf = data.frame(matrix(ncol = length(fstrs),nrow = 0))
names(fdf) = fstrs
log_out = data.frame()
for(i in 1:nrow(tf)){
log = as.data.frame(t(names(fdf)[ifelse((names(fdf) %in% as.character(tf[i,])) == F,NA,T)]))
log_out = rbind(log_out,log)
}
输出将是
log_out
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14
1 I do not want to this thing anymore <NA> <NA> <NA> <NA> <NA> <NA>
2 I do not <NA> <NA> <NA> <NA> <NA> you know what mean <NA> <NA>
3 I do not <NA> <NA> this thing <NA> <NA> <NA> <NA> <NA> will <NA>
4 <NA> do not want <NA> <NA> <NA> anymore <NA> <NA> <NA> <NA> <NA> <NA>
5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> you <NA> <NA> <NA> will see