Question

因为我是R的新手和一般的编程新手，我希望有人可以帮助我。我有一个矩阵，其中第2列中有文本正文。为了进一步分析，我想将该文本分成相等长度的多个部分，即相同数量的单词（开头为2）。我还想进一步处理这些新部件，所以我希望它们能够集成到现有的矩阵中，而不是新的列。

现在我找到了拆分功能，想知道我是否可以解决我的问题？ Split function

另外，我可以实现一个动态字计数器（“计算消息中的每个字，直到值大于......”）？

非常感谢任何有关如何进步的提示。提前谢谢你。

编辑2：到目前为止我的代码看起来像这样：

library(tm)
library(NLP)

TestMatrix2 = matrix(c("1", "2","The masked shrike (Lanius nubicus) is a bird in the shrike family, Laniidae. It breeds in southeastern","The throat, neck sides and underparts are white, with orange flanks and breast","17","13"),2,3)
colnames(TestMatrix2) = c("index","news body", "word count")

Test2 <- data.frame(strsplit(TestMatrix2[[1,1]], " "),stringsAsFactors=FALSE)
 NewsPartitioning <- function(NumberOfParts = 2, NewsIndicator= 1){
 MaxWords = TestMatrix2[NewsIndicator,3]
 CritValue = TestMatrix2[NewsIndicator,3]/NumberOfParts
 as.integer(CritValue)
new = list()
colnames(Test2) = c("Words")

for (i in 1:CritValue){new = c(new,Test2$Words[i])}
 new = unlist(new)
 TestMatrix[NewsIndicator,3+NumberOfParts] = paste(new, collapse = " ")

for (i in CritValue+1:MaxWords){new = c(new,Test2$Words[i])}
 new = unlist(new)
 TestMatrix[Nachricht,3+NumberOfParts] = paste(new, collapse = " ")
}

目前，我收到错误消息“新列会在现有列之后留下漏洞”。

我猜这个程序既不高效又不优雅。有什么想法或帮助吗？

祝你好运巴斯蒂

将文本拆分为相等大小的部分

0 个答案: