我有两个CSV文件。两个文件中的第一列是时间戳,但所有其他列包含不同的数据。其中一些时间戳重叠但出现在不同的行中 我想创建一个新文件,其中包含所有重叠的时间戳,以及两个文件中的相关数据。
例如:
文件1:
['1', 'John', 'Doe']
['2', 'Jane', 'Deer']
['3', 'Horror', 'Movie']
文件2:
['2', 'Mac']
['3', 'bro']
['4', 'come']
['1', '@mebro']
输出文件:
['1', 'John', 'Doe', '@mebro']
['2', 'Jane', 'Deer', 'Mac']
['3', 'Horror', 'Movie', 'bro']
这是我到目前为止的代码:
Outfile = []
for row in file2:
Outfile.append(tuple(row))
if len(file1) >= len(file2):
for n in xrange(1,len(file2)):
if file1[0][n] == file2[0][:]:
Outfile.append(file1[1:8][n])
if len(file2) >= len(file1):
for n in xrange(1,len(file1)):
if file1[0][n] == file2[0][:]:
Outfile.append(file1[1:8][n])
resultFile = open("resultFile.csv","wb")
wr = csv.writer(Outfile, dialect= "excel")
wr.writerows(Outfile)
答案 0 :(得分:0)
使用pandas库。
import pandas as pd
df1 = pd.read_csv("path to file 1")
df2 = pd.read_csv("path to file 2")
result = merge(df1, df2, on='First column', sort=True)
result.to_csv("path to result file")
merge将使用指定的列连接两个数据帧。
答案 1 :(得分:0)
mds给出的答案更有效率,我只将此作为补充信息,因为您尝试使用列表索引的方式存在许多基本问题。此代码将提供您正在查找的输出列表,并可能更好地说明它们的工作方式(在file2中添加'example'以显示它将如何添加其他元素)。
list1 = [['1', 'John', 'Doe'],
['2', 'Jane', 'Deer'],
['3', 'Horror', 'Movie']]
list2 = [['2', 'Mac', 'example'],
['3', 'bro'],
['4', 'come'],
['1', '@mebro']]
for x in range(len(list1)):
print "List1 timestamp for consideration: " + str(list1[x][0])
for y in range(len(list2)):
print "Compared to list2 timestamp: " + str(list2[y][0])
if list1[x][0] == list2[y][0]:
print "Match"
for z in range(1,len(list2[y])):
list1[x].append(list2[y][z])
您的打印输出是:
List1 timestamp for consideration: 1
Compared to list2 timestamp: 2
Compared to list2 timestamp: 3
Compared to list2 timestamp: 4
Compared to list2 timestamp: 1
Match
List1 timestamp for consideration: 2
Compared to list2 timestamp: 2
Match
Compared to list2 timestamp: 3
Compared to list2 timestamp: 4
Compared to list2 timestamp: 1
List1 timestamp for consideration: 3
Compared to list2 timestamp: 2
Compared to list2 timestamp: 3
Match
Compared to list2 timestamp: 4
Compared to list2 timestamp: 1
使用list1然后看起来像:
list 1 = [['1', 'John', 'Doe', '@mebro'],
['2', 'Jane', 'Deer', 'Mac', 'example'],
['3', 'Horror', 'Movie', 'bro']]