Question

我有一个输入文件有一个段落。我需要找到该段中特定单词的频率。

cat file：

Text    Index
train is good   1
let the train come      5
train is best   3
i m great       3
what is best    2

代码：

 input<-read.table("file",sep="\t",header=TRUE)
 paragraph1<-input[1][1]
 word<-"train"

我需要在第1段找到单词“train”的频率。我怎样才能使用R？

Answer 1

如果您提供了更多信息，我可能会提供更多信息作为回报。使用qdap即可：

library(qdap)

dat <- readLines(n=5)
train is good   1
let the train come      5
train is best   3
i m great       3
what is best    2

dat <- do.call(rbind.data.frame, strsplit(dat, "   +"))

colnames(dat) <- c("Text", "Index")

termco(dat$Text, , " train ")

## > termco(dat$Text, , " train ")
##   all word.count     train
## 1 all         16 3(18.75%)

您可以使用termco一次完成所有段落。有关termco的更多信息，请参阅this link。

这很大程度上取决于什么是分隔段落，你如何阅读它们，如何缩进等等。

海报发现以下内容很有用：

length(gregexpr("the", "the dog ate the word the", fixed = TRUE)[[1]])

如何在R中的句子中获得单词的频率？

1 个答案: