Question

所以我想要将这些CSV文件合并如下：

file1.csv
Date,Time,Unique1,Common
blah,blah,55,92

file2.csv
Date,Time,Unique2,Common
blah,blah,12,25

我想要一个pandas数据帧......

Date,Time,Unique1,Unique2,Common (order of columns doesn't matter)
blah,blah,55,12,117

..其中92 + 25是117。

我找到了一个与此标题完全相同的帖子，其中包含以下代码示例：

each_df = (pd.read_csv(f) for f in all_files)
full_df = pd.concat(each_df).groupby(level=0).sum()

这就是我需要的，除了它没有结转日期和时间列。我想这是因为sum（）不知道如何处理它。

我改为......

Unique1,Unique2,Common
<values as expected>

请帮我通过日期和时间列。它们应该在每个文件中完全相同，所以我可以按“日期”和“时间”列索引数据。

提前致谢。

Answer 1

我认为您正在寻找merge而不是concat。如果将每个csv转换为数据帧，您可以执行以下操作：

new_df = df2.merge(df1, on=['Date','Time'], how='inner')
new_df['Common'] = new_df['Common_x'] + new_df['Common_y']
new_df[['Date', 'Time','Unique1', 'Unique2' ,'Common']]
#output

   Date  Time  Unique1  Unique2  Common
0  blah  blah       55       12     117

你也可以试试这个衬垫：

one_line = df2.merge(df1, on=['Date','Time'], how='inner').\
set_index(['Date', 'Time','Unique1', 'Unique2']).sum(axis=1).reset_index().\
rename(columns = {0:'Common'})

#output

   Date  Time  Unique1  Unique2  Common
0  blah  blah       55       12     117

Answer 2

对于两个以上的数据框架，这可能是更好的选择：

import pandas as pd
from functools import reduce

# We will be splitting the data into two groups
all_files1 = (pd.read_csv(f) for f in all_files)
all_files2 = (pd.read_csv(f) for f in all_files)

# Merge the data frames together dropping the 'Common' column and set an index
# Default is an inner join.
split_drop_common = reduce(lambda df1, df2 : df1.merge(df2, on=['Date','Time']),
                [df.drop(columns='Common') for df in all_files1]).set_index(['Date','Time'])
# set up the second group
stage = pd.concat(all_files2)

# Drop any of the unique columns and sum the 'Common' column
keep_columns = ['Date','Time','Common']
split_only_common = stage[keep_columns].groupby(['Date','Time']).sum()


# Join on indices. Default is an inner join.
# You can specify the join type with kwarg how='join type'
combine = split_drop_common.join(split_only_common)
combine

# Output

   Date  Time  Unique1  Unique2  Common
0  blah  blah       55       12     117

您可以阅读有关reduce函数的工作原理here。

合并两个pandas数据帧，添加相应的值pt2

2 个答案: