因为我是R的新手和一般的编程新手,我希望有人可以帮助我。我有一个矩阵,其中第2列中有文本正文。为了进一步分析,我想将该文本分成相等长度的多个部分,即相同数量的单词(开头为2)。我还想进一步处理这些新部件,所以我希望它们能够集成到现有的矩阵中,而不是新的列。
现在我找到了拆分功能,想知道我是否可以解决我的问题? Split function
另外,我可以实现一个动态字计数器(“计算消息中的每个字,直到值大于......”)?
非常感谢任何有关如何进步的提示。提前谢谢你。
编辑2: 到目前为止我的代码看起来像这样:
library(tm)
library(NLP)
TestMatrix2 = matrix(c("1", "2","The masked shrike (Lanius nubicus) is a bird in the shrike family, Laniidae. It breeds in southeastern","The throat, neck sides and underparts are white, with orange flanks and breast","17","13"),2,3)
colnames(TestMatrix2) = c("index","news body", "word count")
Test2 <- data.frame(strsplit(TestMatrix2[[1,1]], " "),stringsAsFactors=FALSE)
NewsPartitioning <- function(NumberOfParts = 2, NewsIndicator= 1){
MaxWords = TestMatrix2[NewsIndicator,3]
CritValue = TestMatrix2[NewsIndicator,3]/NumberOfParts
as.integer(CritValue)
new = list()
colnames(Test2) = c("Words")
for (i in 1:CritValue){new = c(new,Test2$Words[i])}
new = unlist(new)
TestMatrix[NewsIndicator,3+NumberOfParts] = paste(new, collapse = " ")
for (i in CritValue+1:MaxWords){new = c(new,Test2$Words[i])}
new = unlist(new)
TestMatrix[Nachricht,3+NumberOfParts] = paste(new, collapse = " ")
}
目前,我收到错误消息“新列会在现有列之后留下漏洞”。
我猜这个程序既不高效又不优雅。有什么想法或帮助吗?
祝你好运 巴斯蒂