使用os.walk将多个文本文件转换为csv

时间:2018-05-15 16:41:50

标签: python csv

我在文件夹/子文件夹中有多个library(RCurl) library(jsonlite) library(purrr) library(stringr) library(rvest) library(dplyr) library(jsonlite) Sys.setlocale(locale = "Russian") vacanciesdf <- data.frame( Name = character(), Currency = character(), From = character(), Area = character(), Requerement = character(), stringsAsFactors = T, Experience = character() ) # First extract all data into a list vacanciesdf.list <- list() for (pageNum in 0:1) { data <- jsonlite::fromJSON(paste0("https://api.hh.ru/vacancies?text=\"machine+learning\"&page=", pageNum)) message("Processing page:",print(pageNum)) # Here I assume that Name data is always present # For all other columns, fill them with missing values if they are not present (NULL) Name = data$items$area$name Currency = if (is.null(data$items$salary$currency)) rep(NA, length(Name)) else data$items$salary$currency From = if (is.null(data$items$salary$from)) rep(NA, length(Name)) else data$items$salary$from Area = if (is.null(data$items$employer$name)) rep(NA, length(Name)) else data$items$employer$name Requirement = if (is.null(data$items$snippet$requirement)) rep(NA, length(Name)) else data$items$snippet$requirement Experience = if (is.null(data$items$experience$name)) rep(NA, length(Name)) else data$items$experience$name # Add to the list vacanciesdf.list[[pageNum+1]] <- data.frame(Name, Currency, From, Area, Requirement, Experience, stringsAsFactors=FALSE) # I assume you need it only in between reading and you do not need it at the end if (pageNum < 1 ) Sys.sleep(3) } # Combine all elements in the list into a single data.frame library(data.table) vacanciesdf <- as.data.frame( rbindlist(vacanciesdf.list)) 文件,如下所示:

  • 州(文件夹)
    • 亚利桑那州(subfoler)
      • FILE1.TXT
      • FILE2.TXT
      • file3.txt
    • 阿拉斯加(子文件夹)
      • FILE1.TXT
      • FILE2.TXT
      • file3.txt
      • file4.txt
    • 阿肯色州(子文件夹)
      • FILE1.TXT
      • FILE2.TXT

我需要将所有文件转换为csv并合并每个文件夹的csv文件(例如arizona_files.csv,alaska_files.csv)。我试图使用下面的代码,没有输出。知道我做错了吗?

.txt

2 个答案:

答案 0 :(得分:2)

https://docs.python.org/3/library/os.html中所述,os.walk()提供的文件名不包含路径元素和&#34;要获取完整路径(以top开头)到dirpath中的文件或目录,请执行操作.path.join(dirpath,name)。&#34;这就是你得到这个错误的原因。

答案 1 :(得分:0)

您没有在正确的目录中执行代码。在命令提示符中初始化代码时,您需要将python脚本放在迭代路径的顶层。即在States文件夹中或在它上面并从该路径启动它。或者,您可以更改in_text以执行以下操作:

in_txt = csv.reader(open(os.path.join(path,filename), "rb"), delimiter = '\t')

这会告诉csv.reader到底找到当前文件的确切位置。在编写csv时,您还必须添加相同类型的操作。

out_csv.writerows(os.path.join(path,filename))