为什么收到此错误,“错误:找不到对象'文件夹'”?

时间:2020-04-11 03:16:45

标签: r

R的代码在下面,我不确定为什么找不到对象文件夹。 我首先使用untar()函数将tar文件解压缩。然后,创建一个包含20news-bydate-train数据的培训文件夹,使用make函数读取文件夹,并创建一个数据框来保存新闻组的标题,消息ID和附带的文本。

library(dplyr)
library(tidyr)
library(purrr)

url <- "http://qwone.com/~jason/20Newsgroups/20news-bydate.tar.gz"
download.file(url, destfile = "20news-bydate.tar.gz")
untar("20news-bydate.tar.gz")

training_folder <- "20news-bydate-train"

# Create a function to read all files from a folder into a data frame
read_folder <- function(infolder) {
  data_frame(file = dir(infolder, full.names = TRUE)) %>%
    mutate(text = map(file, read_lines)) %>%
    transmute(id = basename(file), text) %>%
    unnest(text)
}

# Use unnest() and map() to apply read_folder to each subfolder
(raw_text <- data_frame(folder = dir(training_folder, full.names = TRUE)) %>%
    unnest(map(folder, read_folder)) %>%
    transmute(newsgroup = basename(folder), id, text))

1 个答案:

答案 0 :(得分:0)

注意: read_folder 函数中使用的 read_lines 函数需要库(阅读器)。这个问题不存在。作者不知道问题“我为什么会收到错误”的确切答案。以下是尝试解决此问题的尝试。


最可能的问题:
在数据帧上应用unnest()时,必须首先对其进行突变。发问者可能会使用其在描述之前所存在的功能。添加这一小步骤,确保数据得到正确处理。

可能的解决方案:

library(dplyr)
library(tidyr)
library(purrr)
library(readr)

url <- "http://qwone.com/~jason/20Newsgroups/20news-bydate.tar.gz"
download.file(url, destfile = "20news-bydate.tar.gz")
untar("20news-bydate.tar.gz")

training_folder <- "20news-bydate-train"

# Create a function to read all files from a folder into a data frame
read_folder <- function(infolder) {
  data_frame(file = dir(infolder, full.names = TRUE)) %>%
    mutate(text = map(file, read_lines)) %>%
    transmute(id = basename(file), text) %>%
    unnest(text)
}

raw_text <- data_frame(folder = dir(training_folder, full.names = TRUE)) %>%
  mutate(temp = map(folder, read_folder)) %>%
  unnest(temp) %>%
  transmute(newsgroup = basename(folder), id, text)

转换为数据框

raw_text_df <- as.data.frame(raw_text)

输出看起来像这样:

> print(head(raw_text_df ))
    newsgroup    id                                                               text
1 alt.atheism 49960                                 From: mathew <mathew@mantis.co.uk>
2 alt.atheism 49960                        Subject: Alt.Atheism FAQ: Atheist Resources
3 alt.atheism 49960    Summary: Books, addresses, music -- anything related to atheism
4 alt.atheism 49960 Keywords: FAQ, atheism, books, music, fiction, addresses, contacts
5 alt.atheism 49960                             Expires: Thu, 29 Apr 1993 11:57:19 GMT
6 alt.atheism 49960                                                Distribution: world

希望有帮助。