Question

我有两个文本文件。我想将它们的某些列合并到一个新的文本文件中。

我正在尝试此操作，但是它不起作用：

with open('1','r') as first:
    with open('2', 'r') as second:
        data1 = first.readlines()
        for line in data1:
            output = [(item.strip(), line.split(' ')[2]) for item in second]
            f = open("1+2","w")
            f.write("%s  %s\n" .format(output))
            f.close()

我拥有的第一个文本文件：

我拥有的第二个文本文件：

我想要一个新文件，第一个文件中的列，第二个文件中的第二列，如下所示：

Answer 1

您可以遍历各个行对，并将第一个文件的第一列与第二个文件的第二列连接：

with open('file_1.txt') as f1, open('file_2.txt') as f2, open('new_file.txt', 'w') as fr:
    for line in ("{} {}".format(l1.rstrip('\n'), l2.split(maxsplit=1)[1]) for l1, l2 in zip(f1, f2)):
        fr.write(line)

如果您确定各列之间用一个空格隔开，则也可以像这样使用str.partition：

l2.partition(' ')[-1]

示例：

In [28]: with open('file_1.txt') as f1, open('file_2.txt') as f2, open('new_file.txt', 'w') as fr:
    ...:     for line in ("{} {}".format(l1.rstrip('\n'), l2.split(maxsplit=1)[1]) for l1, l2 in zip(f1, f2)):
    ...:         fr.write(line)
    ...:     

In [29]: cat new_file.txt
1 3
2 5
3 7
4 3

顺便说一句，当两个文件中的行数都不相同，并且希望保持最长的行数时，可以查看itertools.zip_longest而不是zip。 / p>

Answer 2

假设两个文件都是数据文件，则可以使用numpy模块。

loadtxt将文本文件加载到数组中。
savetxt将数组保存到文本文件中。您还可以指定使用fmt选项保存的数字的格式。

代码在这里：

import numpy as np

data1 = np.loadtxt("file1.txt")
data2 = np.loadtxt("file2.txt")
print(data1)
# [1. 2. 3. 4.]
print(data2)
# [[1. 3.]
#  [2. 5.]
#  [5. 7.]
#  [7. 3.]]

data2[:, 0] = data1
print(data2)
# [[1. 3.]
#  [2. 5.]
#  [3. 7.]
#  [4. 3.]]
np.savetxt('output.txt', data2, fmt="%d")

Answer 3

from itertools import izip

with open("file1.txt") as textfile1, open("file2.txt") as textfile2, open('output.txt', 'w') as out: 
    for x, y in izip(textfile1, textfile2):
        x = x.strip()
        y = y.split(" ")[1].strip()
        print("{0} {1}".format(x, y))
        out.write("{0} {1}\n".format(x, y))

Answer 4

关于如何实现的很多有趣的答案，但是没有一个显示如何修复您的代码。当我们了解自己的错误而不是找到解决方案时，我发现学习起来更好；）

同一行中的元组具有相反的对象名称-您要删除（从第一个文件中删除）行，并从（第二个文件中删除）项并拆分第二个元素（即[1]）

有了这些小的更改（以及其他在注释中描述的更改），我们得到：

with open('1','r') as first:
    with open('2', 'r') as second:
        #data1 = first.readlines() #don't do that, iterate over the file
        for line in first: #changed
            output = [(line.strip(), item.split(' ')[1]) for item in second]
            f = open("1+2","a") #use mode "a" for appending - otherwise you're overwriting your data!
            f.write("{}  {}".format(output)) # don't mix python2 and python3 syntax, removed extra newline
            f.close()

但这仍然是错误的。为什么？因为for item in second-您在这里解析整个第二个文件。在第一个文件的第一行中。

我们需要更改它，以便仅包含一个元素。我建议您read this question and explanations about iterators。

现在让我们应用以下知识：second是一个迭代器。我们只需要其中的一个元素，就需要手动进行操作（因为我们处于另一个循环中-一次循环处理两件事是一件棘手的事情），因此我们将使用next(second)：

with open('1','r') as first:
    with open('2', 'r') as second:
        for line in first: 
            item = next(second)
            output = (line.strip(), item.split(' ')[1]) #no list comprehension here
            f = open("1+2","a") 
            f.write("{}  {}".format(*output)) #you have to unpack the tuple
            f.close()

Explanation about unpacking-基本上，当您仅传递output时，Python会将其视为一次元素，并且不知道如何处理其他{}。您必须说“嘿，将这个可迭代的对象（在这种情况下为2元素元组）视为单个元素，而不是整体”，这就是*的工作方式。：）

如何将两个文本文件合并为一个？

4 个答案: