Question

我有两个.csv文件：File1 = 28列和1000行，File2 = 29列和100行。该行没有索引，因此，我不知道两个文件中哪些行相同。

对于文件1中的每一行，我想在File2中添加一个值为29. column的新列，当其他28列相同时。

File1:
a,b,c,...,EMPTY
x,y,z,...,EMPTY

File2:
a,b,c,...,B1
x,y,z,...,B2

Output:
a,b,c,...,B1
x,y,z,...,B2

到目前为止，我刚开始;

with open(('File1.csv', 'rb'), delimiter=';') as test1:
    reader = csv.reader(test1)
    next(reader, None)  # ignore header
    test1 = set(row[0:28] for row in reader)
with open(('File2.csv', 'rb'), delimiter=';') as test2:
    reader = csv.reader(test2)
    next(reader, None)  # ignore header
    test2 = set(row[0:28] for row in reader)

Answer 1

我建议使用numpy.loadtxt加载两个csv文件，因为它更有效，并且它为内容提供了一个结构。然后，File1的数组f1为1000 X 28，File 2的数组f2为100 X 29。

第二步是使用f1 = numpy.column_stack([f1, numpy.zeros(f1.shape[0])])向File1添加新列。

然后，您可以使用：

遍历最小的数组

for row in f2:
    # find the row(s) in f1 with the same 28 first columns
    equal_rows = np.argwhere((f1[:, :28] == row[:28]).all(axis=1))
    for row2 in equal_rows:
        # add the last column of f2
        f1[row2[0], -1] = row[-1]

我希望这会有所帮助。

当其他列相同时，如何将.csv中的列值添加到另一个列中？

1 个答案: