我在一个文件夹中有10个TAB分隔的txt文件。它有三列(仅包含数字),前面有21行标题(文本和数字)。为了进一步处理它们,我想:
我对脚本的了解很少。我有Rstudio和Python,并尝试摆弄一些。但是我真的不知道该怎么办。由于必须处理多个文件夹,因此如果可以的话,我的工作将得到简化。
答案 0 :(得分:1)
根据您的要求,听起来这Python代码应该可以解决问题:
import os
import glob
DIR = "path/to/your/directory"
OUTPUT_FILE = "path/to/your/output.csv"
HEADER_SIZE = 21
input_files = glob.glob(os.path.join(DIR, "*.txt"))
for input_file in input_files:
print("Now processing", input_file)
# read the file
with open(input_file, "r") as h:
contents = h.readlines()
# drop header
contents = contents[HEADER_SIZE:]
# grab the 2nd column
column = []
for row in contents:
# stop at the footer
if "####" in row:
break
split = row.split("\t")
if len(split) >= 2:
column.append(split[1])
# replace the comma
column_replaced = [row.replace(",", ".") for row in column]
# append to the output file
with open(OUTPUT_FILE, "a") as h:
h.write("\n".join(column_replaced))
h.write("\n") # end on a newline
请注意,这将丢弃输出文件中第二列以外的所有内容。
答案 1 :(得分:0)
下面的代码不是精确的解决方案,但是如果按照其代码行进行,您将接近所需的内容。
output <- "NewFileName.txt"
old_dir <- setwd("your/folder")
files <- list.files("\\.txt")
df_list <- lapply(files, read.table, skip = 21, sep = "\t")
x <- lapply(df_list, '[[', 2)
x <- gsub(",", ".", unlist(x))
write.table(x, output, row.names = FALSE, col.names = FALSE)
setwd(old_dir)
答案 2 :(得分:0)
list =[]
filename = "my_text"
file = open(filename, "r")
for line in file:
res=line.replace(",", ".")
list.append(res)
print(res)
f = open(filename, "w")
for item in list:
f.write(item)`enter code here`