我有一个带有文本列的数据框,我需要忽略或消除前2个单词,并对该列中的字符串进行计数。
b=data.frame(text=c("hello sunitha what can I do for you?","hi john what can I do for you?")
数据帧“ b”中的预期输出: 请建议我,我们如何删除前2个字,以便“我能为您做些什么? = 2
答案 0 :(得分:1)
您可以使用gsub
删除前两个单词,然后使用tapply
进行计数,即
i1 <- gsub("^\\w*\\s*\\w*\\s*", "", b$text)
tapply(i1, i1, length)
#what can I do for you?
# 2
如果您需要删除任何范围的单词,我们可以对i1
进行如下修改,
i1 <- sapply(strsplit(as.character(b$text), ' '), function(i)paste(i[-c(2:4)], collapse = ' '))
tapply(i1, i1, length)
#hello I do for you? hi I do for you?
# 1 1
答案 1 :(得分:0)
b=data.frame(text=c("hello sunitha what can I do for you?","hi john what can I do for you?"),stringsAsFactors = FALSE)
b$processed = sapply(b$text, function(x) (strsplit(x," ")[[1]]%>%.[-c(1:2)])%>%paste0(.,collapse=" "))
b$count = sapply(b$processed, function(x) length(strsplit(x," ")[[1]]))
> b
text processed count
1 hello sunitha what can I do for you? what can I do for you? 6
2 hi john what can I do for you? what can I do for you? 6
您是否正在寻找类似的东西?请注意stringsAsFactors = FALSE
,否则您的文本将为factor
类型且难以处理。