在检查我的termdocumentmatrix时,列标题显示为数字而不是文件名(PDF' s)。
Docs
Terms **10 11 2 3 4 5 6 7 8 9**
abil 1 2 0 0 0 0 0 0 0 1
abl 0 1 0 0 6 0 1 0 0 0
access 4 6 0 0 3 0 0 0 0 1
accord 0 2 1 0 2 0 0 0 0 2
account 3 2 0 0 0 0 1 0 0 1
activ 5 18 2 5 14 1 3 2 2 10
addit 3 1 2 0 0 1 2 0 3 2
address 1 1 2 1 0 0 0 0 2 3
adequ 0 2 0 0 2 2 0 0 0 1
adequaci 1 0 0 0 1 1 0 0 2 2
这是我到目前为止的步骤:
setwd("E:/OneDrive/Thesis/Received comments document/Consultation 14")
getwd()
library(pdftools)
files <- list.files(pattern = "pdf$")
comments <- lapply(files, pdf_text)
corp <- Corpus(VectorSource(comments))
Comments.tdm <- TermDocumentMatrix(corp, control = list(removePunctuation = TRUE,
stopwords = TRUE,
tolower = TRUE,
stemming = TRUE,
removeNumbers = TRUE,
bounds = list(global = c(3, Inf)))`)
inspect(Comments.tdm[1:11,])
我试图通过使用:
来解决这个问题meta(corp[[1]], tag = "id") <- files[1]
返回错误消息:
**Error in `[.data.frame`(x$dmeta, tag) : undefined columns selected**
如何确保列标题显示PDF的文件名?