Question

我用R做了一个twitter的摘录，但在分析输出时，我得到了大量的空间计数和数字。我该如何删除这些

我正在使用以下代码：

tweets <- searchTwitter('weather', n=10,lang='en')
t <- twListToDF(tweets)
tw.text <- t[,"text"]
tw.text <- tolower(tw.text)
tw.text <- removeWords(tw.text,c(stopwords('en'),'rt'))
tw.text <- removePunctuation(tw.text,TRUE)
tw.text <- unlist(strsplit(tw.text,' '))
word <- sort(table(tw.text),TRUE)
wordc <- head(word,n=10)

当我运行wordc时，我得到以下内容：

> wordc
tw.text
                       RT      weather       County          EST       Severe Thunderstorm      Warning           25        430PM 
          31            4            4            3            3            3            3            3            2            2

如你所见，我得到31个条目空白，2个条目有25个条目，2个条目有430PM。如何删除这些类型的条目？

Answer 1

在tw.text <- unlist(strsplit(tw.text,' '))之后，您有一个文本元素向量。您可以使用sub和which函数来获取非空白的值。这是一个例子：

foo <- c("hi"," ","     ","test")
bar <- foo[which(sub(" +","",foo)!="")]
length(bar)
[1] 2
print(bar)
[1] "hi"   "test"

当然，如果你想从每个条目中删除所有空格，你可以移动sub函数来存储剥离的值（即。sub(" +","",foo)给你一个没有空格的向量）< / p>

删除文本分析中的空格和数字

1 个答案: