我有多个csv文件,它们根据目录中的名称读入单个数据帧,就像这样
# ask user for path
path = input('Enter the path for the csv files: ')
os.chdir(path)
# loop over filenames and read into individual dataframes
for fname in os.listdir(path):
if fname.endswith('Demo.csv'):
demoRaw = pd.read_csv(fname, encoding = 'utf-8')
if fname.endswith('Key2.csv'):
keyRaw = pd.read_csv(fname, encoding = 'utf-8')
然后我过滤以仅保留某些列
# filter to keep desired columns only
demo = demoRaw.filter(['Key', 'Sex', 'Race', 'Age'], axis=1)
key = keyRaw.filter(['Key', 'Key', 'Age'], axis=1)
然后,我创建上述数据帧的列表,并使用reduce将它们合并到Key
# create list of data frames for combined sheet
dfs = [demo, key]
# merge the list of data frames on the Key
combined = reduce(lambda left,right: pd.merge(left,right,on='Key'), dfs)
然后我放下自动生成的列,创建一个Excel编写器并写入一个csv
# drop the auto generated index colulmn
combined.set_index('RecordKey', inplace=True)
# create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('final.xlsx', engine='xlsxwriter')
# write to csv
combined.to_excel(writer, sheet_name='Combined')
meds.to_excel(writer, sheet_name='Meds')
# Close the Pandas Excel writer and output the Excel file.
writer.save()
问题是某些文件具有其他文件中没有的密钥。例如
演示文件
Key Sex Race Age
1 M W 52
2 F B 25
3 M L 78
密钥文件
Key Key2 Age
1 7325 52
2 4783 25
3 1367 78
4 9435 21
5 7247 65
现在,如果每个键中都有一个匹配的键,它将仅包括行(换句话说,它只是将带有键的行留在其他文件中)。即使键不匹配,如何合并所有文件中的所有行?因此最终结果将如下所示:
Key Sex Race Age Key2 Age
1 M W 52 7325 52
2 F B 25 4783 25
3 M L 78 1367 78
4 9435 21
5 7247 65
我不在乎空白单元格是否为空白,NaN,#N / A等。只要我能识别它们即可。
答案 0 :(得分:1)
将combined = reduce(lambda left,right: pd.merge(left,right,on='Key'), dfs)
替换为:combined=pd.merge(demo,key, how='outer', on='Key')
,您将必须具体说明“外部”才能同时加入Key和Demo的整个表