Question

我有几百个包含大量信息的文本文件。每个文件有3列（前两个对所有文件都相同）。我需要合并新文件中所有文件的第三列。并插入一个列标题，其中包含该列所属文件的名称。

包含三列的txt文件，如下所示：

-118.33333333333279 40.041666666667908 11.409999847412109
-118.29166666666612 40.041666666667908 11.090000152587891
-118.24999999999946 40.041666666667908 10.920000076293945
-118.20833333333279 40.041666666667908 10.949999809265137

我尝试创建的txt文件应如下所示：

Name_of_file_1 Name_of_file_2 Name_of_file_3
3rd_Column_File_1 3rd_Column_File_2 3rd_Column_File_3
3rd_Column_File_1 3rd_Column_File_2 3rd_Column_File_3
3rd_Column_File_1 3rd_Column_File_2 3rd_Column_File_3
3rd_Column_File_1 3rd_Column_File_2 3rd_Column_File_3

这可能吗？我无法找到办法。请帮忙!!!

PEPO

Answer 1

我会使用unix工具：

mkfifo pipe1
mkfifo pipe2
mkfifo pipe3

cut -d " " -f 3 text1.csv > pipe1 &
cut -d " " -f 3 text2.csv > pipe2 &
cut -d " " -f 3 text3.csv > pipe3 &

paste pipe1 pipe2 pipe3 > final.csv

rm pipe1 pipe2 pipe3

使用工具的链接：

您可以使用上面的代码示例开发自己的shell脚本。

Answer 2

这是一种方法。对内联代码的评论：

import csv

# List of your files
file_names = ['file1', 'file2']

# Output list of generator objects
o_data = []

# Open files in the succession and 
# store the file_name as the first
# element followed by the elements of
# the third column.
for afile in file_names:
    file_h = open(afile)
    a_list = []
    a_list.append(afile)
    csv_reader = csv.reader(file_h, delimiter=' ')
    for row in csv_reader:
        a_list.append(row[2])
    # Convert the list to a generator object
    o_data.append((n for n in a_list))
    file_h.close()

# Use zip and csv writer to iterate
# through the generator objects and 
# write out to the output file
with open('output', 'w') as op_file:
    csv_writer = csv.writer(op_file, delimiter=' ')
    for row in list(zip(*o_data)):
        csv_writer.writerow(row)
op_file.close()

在一个文件中组合多个文件的列 - Python

2 个答案: