Question

我在一个文件夹中有10个TAB分隔的txt文件。它有三列（仅包含数字），前面有21行标题（文本和数字）。为了进一步处理它们，我想：

从所有文本文件中选择第二列（从21行标题开始；我带有箭头），将逗号转换为十进制，然后将10个文件中的每个列堆叠到一个新的制表符分隔/ csv文件中。一次所有文件。

我对脚本的了解很少。我有Rstudio和Python，并尝试摆弄一些。但是我真的不知道该怎么办。由于必须处理多个文件夹，因此如果可以的话，我的工作将得到简化。

Answer 1

根据您的要求，听起来这Python代码应该可以解决问题：

import os
import glob

DIR = "path/to/your/directory"
OUTPUT_FILE = "path/to/your/output.csv"
HEADER_SIZE = 21

input_files = glob.glob(os.path.join(DIR, "*.txt"))

for input_file in input_files:
    print("Now processing", input_file)

    # read the file
    with open(input_file, "r") as h:
        contents = h.readlines()

    # drop header
    contents = contents[HEADER_SIZE:]

    # grab the 2nd column
    column = []
    for row in contents:
        # stop at the footer
        if "####" in row:
            break

        split = row.split("\t")

        if len(split) >= 2:
            column.append(split[1])

    # replace the comma
    column_replaced = [row.replace(",", ".") for row in column]

    # append to the output file
    with open(OUTPUT_FILE, "a") as h:
        h.write("\n".join(column_replaced))
        h.write("\n")  # end on a newline

请注意，这将丢弃输出文件中第二列以外的所有内容。

Answer 2

下面的代码不是精确的解决方案，但是如果按照其代码行进行，您将接近所需的内容。

output <- "NewFileName.txt"

old_dir <- setwd("your/folder")
files <- list.files("\\.txt")
df_list <- lapply(files, read.table, skip = 21, sep = "\t")
x <- lapply(df_list, '[[', 2)
x <- gsub(",", ".", unlist(x))
write.table(x, output, row.names = FALSE, col.names = FALSE)
setwd(old_dir)

Answer 3

list =[]
filename = "my_text"
file = open(filename, "r")
for line in file:
    res=line.replace(",", ".")
    list.append(res)
    print(res)

f = open(filename, "w")
for item in list:
    f.write(item)`enter code here`

在多个文本文件中用逗号替换小数

3 个答案: