如何删除前几个单词并计算

时间:2019-03-29 07:14:32

标签: r

我有一个带有文本列的数据框,我需要忽略或消除前2个单词,并对该列中的字符串进行计数。

 b=data.frame(text=c("hello sunitha what can I do for you?","hi john what can I do for you?")

数据帧“ b”中的预期输出: 请建议我,我们如何删除前2个字,以便“我能为您做些什么? = 2

2 个答案:

答案 0 :(得分:1)

您可以使用gsub删除前两个单词,然后使用tapply进行计数,即

i1 <- gsub("^\\w*\\s*\\w*\\s*", "", b$text)
tapply(i1, i1, length)
#what can I do for you? 
#                     2

如果您需要删除任何范围的单词,我们可以对i1进行如下修改,

i1 <- sapply(strsplit(as.character(b$text), ' '), function(i)paste(i[-c(2:4)], collapse = ' '))
tapply(i1, i1, length)
#hello I do for you?    hi I do for you? 
#                  1                   1 

答案 1 :(得分:0)

 b=data.frame(text=c("hello sunitha what can I do for you?","hi john what can I do for you?"),stringsAsFactors = FALSE)
b$processed = sapply(b$text, function(x) (strsplit(x," ")[[1]]%>%.[-c(1:2)])%>%paste0(.,collapse=" "))
b$count = sapply(b$processed, function(x) length(strsplit(x," ")[[1]]))
> b
                                  text              processed count
1 hello sunitha what can I do for you? what can I do for you?     6
2       hi john what can I do for you? what can I do for you?     6

您是否正在寻找类似的东西?请注意stringsAsFactors = FALSE,否则您的文本将为factor类型且难以处理。