根据具有不同行的列合并数据帧

时间:2018-12-03 19:04:09

标签: python-3.x pandas reduce

我有多个csv文件,它们根据目录中的名称读入单个数据帧,就像这样

# ask user for path
path = input('Enter the path for the csv files: ')
os.chdir(path)

# loop over filenames and read into individual dataframes
for fname in os.listdir(path):
    if fname.endswith('Demo.csv'):
        demoRaw = pd.read_csv(fname, encoding = 'utf-8')
    if fname.endswith('Key2.csv'):
        keyRaw = pd.read_csv(fname, encoding = 'utf-8')

然后我过滤以仅保留某些列

# filter to keep desired columns only
demo = demoRaw.filter(['Key', 'Sex', 'Race', 'Age'], axis=1)
key = keyRaw.filter(['Key', 'Key', 'Age'], axis=1)

然后,我创建上述数据帧的列表,并使用reduce将它们合并到Key

# create list of data frames for combined sheet
dfs = [demo, key]

# merge the list of data frames on the Key
combined = reduce(lambda left,right: pd.merge(left,right,on='Key'), dfs)

然后我放下自动生成的列,创建一个Excel编写器并写入一个csv

# drop the auto generated index colulmn
combined.set_index('RecordKey', inplace=True)

# create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('final.xlsx', engine='xlsxwriter')

# write to csv
combined.to_excel(writer, sheet_name='Combined')
meds.to_excel(writer, sheet_name='Meds')

# Close the Pandas Excel writer and output the Excel file.
writer.save()

问题是某些文件具有其他文件中没有的密钥。例如

演示文件

Key   Sex   Race   Age
1      M     W     52
2      F     B     25
3      M     L     78

密钥文件

Key   Key2   Age
1      7325     52
2      4783     25
3      1367     78
4      9435     21
5      7247     65

现在,如果每个键中都有一个匹配的键,它将仅包括行(换句话说,它只是将带有键的行留在其他文件中)。即使键不匹配,如何合并所有文件中的所有行?因此最终结果将如下所示:

Key   Sex   Race   Age   Key2   Age
 1      M     W     52    7325     52
 2      F     B     25    4783     25
 3      M     L     78    1367     78
 4                        9435     21
 5                        7247     65

我不在乎空白单元格是否为空白,NaN,#N / A等。只要我能识别它们即可。

1 个答案:

答案 0 :(得分:1)

combined = reduce(lambda left,right: pd.merge(left,right,on='Key'), dfs)替换为:combined=pd.merge(demo,key, how='outer', on='Key'),您将必须具体说明“外部”才能同时加入Key和Demo的整个表