我想计算每行中的字数:
Review_ID Review_Date Review_Content Listing_Title Star Hotel_Name
1 1/25/2016 I booked both the Crosby and Four Seasons but decided to cancel the Four Seasons closer to the arrival date based on reviews. Glad I did. The Crosby is an outstanding hotel. The rooms are immaculate and luxurious, with real attention to detail and none of the bland furnishings you find in even the top chain hotels. Staff on the whole were extremely attentive and seemed to enjoy being there. Breakfast was superb and facilities at ground level gave an intimate and exclusive feel to the hotel. It's a fairly expensive place to stay but is one of those hotels where you feel you're getting what you pay for, helped by an excellent location. Hope to be back! Outstanding 5 Crosby Street Hotel
2 1/18/2016 We've stayed many times at the Crosby Street Hotel and always have an incredible, flawless experience! The staff couldn't be more accommodating, the housekeeping is immaculate, the location's awesome and the rooms are the coolest combination of luxury and chic. During our most recent trip over The New Years holiday, we stayed in the stunning Crosby Suite which has the most extraordinary, gorgeous decor. The Crosby remains our absolute favorite in NYC. Can't wait to return! Always perfect! 5 Crosby Street Hotel
我在想:
WordFreqRowWise %>%
rowwise() %>%
summarise(n = n())
获得类似的结果
Review_ID Review_Content total_Words Min_occrd_word Max Average
1 .... 230 great: 1 the: 25 total_unique/total_words in the row
但是没有想法,我该怎么做......
答案 0 :(得分:2)
以下是使用strsplit
和sapply
的基础R中的方法。假设数据存储在data.frame df中,并且评论存储在变量Review_Content中
# break up the strings in each row by " "
temp <- strsplit(df$Review_Content, split=" ")
# count the number of words as the length of the vectors
df$wordCount <- sapply(temp, length)
在这种情况下,sapply
将返回每行计数的向量。
由于单词计数现在是一个对象,您可以对其执行所需的分析。以下是一些例子:
summary(df$wordCount)
max(df$wordCount)
mean(df$wordCount)
range(df$wordCount)
IQR(df$wordCount)
答案 1 :(得分:1)
添加 @lmo 以上的答案..
下面的代码会生成一个数据框,其中包含所有单词,行方式及其频率:
temp2 <- data.frame()
for (i in 1:length(temp)){
temp1 <- as.data.frame(table(temp[[i]]))
temp1$ID <- paste0("Row_", i)
temp2 <- rbind(temp2, temp1)
temp1 <- NULL
}