R:在for循环中将多行追加到数据框

时间:2018-09-18 16:55:04

标签: r append text-mining

我有从这些Wikipedia页面创建的PDF文件(例如):

https://en.wikipedia.org/wiki/AIM-120_AMRAAM

https://en.wikipedia.org/wiki/AIM-9_Sidewinder

我有一个要在文档中搜索并提取其中出现的句子的关键字列表。

tf.Variable

我可以调用该文件,从PDF中提取文本,从PDF中提取带有关键字的句子。如果我分别对每个关键字执行此操作,则可以正常工作,但是当我尝试在循环中执行此操作时,一直遇到行未追加的问题。相反,它几乎在进行绑定,然后就列数引发了错误。这是我的代码,非常感谢您提供的有关我可以做的工作的任何帮助。

如何使行正确附加并显示在每个PDF文件中?

keywords <- c("altitude", "range", "speed")

当我正确添加行之后,下一步将是将关键字作为变量添加到提取的句子的左侧。理想输出示例:

pdf.files <- list.files(path = "/path/to/file", pattern = "*.pdf", full.names = FALSE, recursive = FALSE)
for (i in 1:length(pdf.files)) {
    for (j in 1:length(keywords)) {
        text <- pdf_text(file.path("path", "to", "file", pdf.files[i]))
        text2 <- tolower(text)
        text3 <- gsub("\r", "", text2)
        text4 <- gsub("\n", "", text3)
        text5 <- grep(keywords[j], unlist(strsplit(text4, "\\.\\s+")), value = TRUE)
    }
    temp <- rbind(text5)
    assign(pdf.files[i], temp)
}

这会在循环中完成还是作为单独的函数发布?

感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

好吧,这需要一些真实的思考,但是我使它起作用了,虽然它不漂亮,但是可以完成工作:

# This first part initializes the files to be written to
files <- list.files(path = "/path/to/file", pattern = "*.*", full.names = FALSE, recursive = FALSE)
for (h in 1:length(files)) {
    temp1 <- data.frame(matrix(ncol = 2, nrow = 0))
    x <- c("Title", "x")
    colnames(temp1) <- x
    write.table(temp1, paste0("/path/to/file", tools::file_path_sans_ext(files[h]), ".txt"), sep = "\t", row.names = FALSE, quote = FALSE)
}
# This next part fills in the files with the sentences
pdf.files <- list.files(path = "/path/to/file", pattern = "*.pdf", full.names = FALSE, recursive = FALSE)
for (i in 1:length(pdf.files)) {
    for (j in 1:length(keywords)) {
        text <- pdf_text(file.path("path", "to", "file", pdf.files[i]))
        text2 <- tolower(text)
        text3 <- gsub("\r", "", text2)
        text4 <- gsub("\n", "", text3)
        text5 <- as.data.frame(grep(keywords[j], unlist(strsplit(text4, "\\.\\s+")), value = TRUE))
        colnames(text5) <- "x"
        if (nrow(text5) != 0) {
            title <- as.data.frame(keywords[j])
            colnames(title) <- "Title"
            temp <- cbind(title, text5)
            temp <- unique(temp)
            write.table(temp, paste0("/path/to/file", tools::file_path_sans_ext(pdf.files[i]), ".txt"), sep = "\t", row.names = FALSE, quote = FALSE, col.names = FALSE, append = TRUE)
        }
    }
}