访问语料库中的元素

时间:2015-10-14 17:42:50

标签: r tm corpus

我正在使用Corpus函数来读取我在下面提到的目录中创建的文件。

chk <- Corpus(DirSource("C:\\Users\\TCS Profile\\Documents\\R\\Machine Learning Text\\Naive Bayes"))

创建语料库后,当我验证创建的变量chk时,我发现内容已被读取:

 str(chk)
List of 1
 $ Test.txt:List of 2
  ..$ content: chr [1:7] "Hi Wassup" "How are You" "Hope it Works!!!" "" ...
  ..$ meta   :List of 7
  .. ..$ author       : chr(0) 
  .. ..$ datetimestamp: POSIXlt[1:1], format: "2015-10-14 16:15:17"
  .. ..$ description  : chr(0) 
  .. ..$ heading      : chr(0) 
  .. ..$ id           : chr "Test.txt"
  .. ..$ language     : chr "en"
  .. ..$ origin       : chr(0) 
  .. ..- attr(*, "class")= chr "TextDocumentMeta"
  ..- attr(*, "class")= chr [1:2] "PlainTextDocument" "TextDocument"
 - attr(*, "class")= chr [1:2] "VCorpus" "Corpus"

问题是我无法访问内容中的特定值,让我们说第3个元素。 (希望它有用!!) 我尝试使用以下代码:

chk[[1]][1,3]
  

chk [[1]] [1,3]中的错误:维数不正确

任何人都可以告诉我如何访问相应的元素以及为什么会出现上述类型的访问错误?

1 个答案:

答案 0 :(得分:1)

这应该有效:

> chk[[1]][1]$content[3]
#[1] "Hope it Works!!!"

我用这些数据重现了你的例子:

chk <-structure(list(content = list(structure(list(content =    c("Hi Wassup ", "How are You ", "Hope it Works!!!", "", "long time no see ", "Howdy", "Yo"), 
meta = structure(list(author = character(0),  datetimestamp = structure(list(sec = 12.238600730896, min = 17L, hour = 19L, mday = 14L, mon = 9L, year = 115L, wday = 3L, yday = 286L, isdst = 0L), 
.Names = c("sec", "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"), 
class = c("POSIXlt", "POSIXt"), tzone = "GMT"), description = character(0), heading = character(0), id = "Test.txt", language = "en", 
origin = character(0)), .Names = c("author", "datetimestamp", "description", "heading", "id", "language", "origin"), 
class = "TextDocumentMeta")), .Names = c("content", "meta"), class = c("PlainTextDocument", "TextDocument"))), meta = structure(list(), class = "CorpusMeta"), 
dmeta = structure(list(), .Names = character(0), row.names = 1L, class = "data.frame")), 
.Names = c("content", "meta", "dmeta"), class = c("VCorpus", "Corpus"))