Question

我从tripadvisor获得了以下数据：

'data.frame':   682 obs. of  6 variables:
 $ X            : int  1 2 3 4 5 6 7 8 9 10 ...
 $ id           : Factor w/ 674 levels "id","rn106322397",..: 672 671 670 669 668 667 666 665 664 663 ...
 $ quote        : Factor w/ 606 levels "\"Picturesque Lake Konigssee\"",..: 389 139 113 149 384 39 176 598 199 603 ...
 $ rating       : Factor w/ 6 levels "1","2","3","4",..: 3 5 5 5 4 5 5 5 4 5 ...
 $ date         : Factor w/ 505 levels "date","Reviewed 1 August 2014\n",..: 200 200 427 427 427 443 434 351 313 494 ...
 $ reviewnospace: Factor w/ 674 levels "- Good car parking facilities- Organized boat trips- Ensure that you have enough time at hand for the boat trip",..: 624 573 144 211 507 26 351 672 451 249 ...

我尝试根据日期对数据进行聚类，以获得两组 - 冬季和夏季度假者。通过这种聚类，我想在之后分析评论。我正在使用tm包并使用以下代码尝试它：

> x <- read.csv ("seeganz.csv", header = TRUE, stringsAsFactors = FALSE, sep = ",")
> corp <- VCorpus(VectorSource(x$reviewnospace), readerControl = list(language = "eng"))
> meta(corp,tag = "date") <- x$date
> idx <- meta(corp, "date") == 'December'

但它没有工作，因为内容说0文件：

> corp [idx]
<<VCorpus>>
Metadata:  corpus specific: 0, document level (indexed): 1
Content:  documents: 0

由于日期的结构是＆＃34; 2014年8月1日＆＃34;，我如何调整此代码才能获得，例如11月至2月的评论？

你知道我怎么能解决这个问题吗？

谢谢。

Answer 1

通用方法：

使用substr(date, 10, nchar(date))转到1 August 2014调用此新向量dateNew
使用普通日期功能，例如as.Date（dateNew，...）将dateNew更改为Date类型的向量，您可以在其中进行子集/减法和其他操作来自http://www.statmethods.net/input/dates.html
的参考文献
```
# use as.Date( ) to convert strings to dates 
mydates <- as.Date(c("2007-06-22", "2004-02-13"))
# number of days between 6/22/07 and 2/13/04 
days <- mydates[1] - mydates[2]
```

TM - 使用特殊日期变量对数据进行聚类

1 个答案: