pandas合并循环内生成的数据帧

时间:2017-09-12 14:26:12

标签: python pandas

假设我有这样的数据帧(在循环内生成并添加到列表中):

column  row data_503    plate
0   1   A   1   2
1   1   B   2   2
2   1   C   3   2
3   1   D   4   2

column  row data_280    plate
0   1   A   1   2
1   1   B   2   2
2   1   C   3   2
3   1   D   4   2

column  row data_503    plate
0   1   A   1   1
1   1   B   2   1
2   1   C   3   1
3   1   D   4   1

column  row data_280    plate
0   1   A   1   1
1   1   B   2   1
2   1   C   3   1
3   1   D   4   1

我有一个布局文件链接将测量结果映射到特定条件:

column  row cond    plate
0   1   A   5   1
1   1   B   5   1
2   1   C   5   1
3   1   D   4   1
0   1   A   5   2
1   1   B   5   2
2   1   C   5   2
3   1   D   4   2

我可以组合数据框,如:

for df in df_list:
    layout= pd.merge(layout, df, on=['plate', 'row', 'column'], how = 'outer')

但是,我始终会收到data_280_xdata_280_y列,但我只想获得data_280data_503列。将outer更改为left不会改变任何内容。

我有什么想法可以获得类似的东西吗?:

column  row cond    plate    data_280    data_503
0   1   A   5   1    1    1
1   1   B   5   1    2    2
2   1   C   5   1    3    3
3   1   D   4   1    4    4
0   1   A   5   2    1    1
1   1   B   5   2    2    2
2   1   C   5   2    3    3
3   1   D   4   2    4    4

4 个答案:

答案 0 :(得分:2)

您可以合并_x_y列,因为它们不会有任何重叠值(基于该布局df),如下所示:

df['data_208'] = df['data_208_x'] + df['data_208_y']

然后您可以删除_x_y列。

使用示例进行更新:

df1 = pd.DataFrame({"column": [1, 1, 1, 1], "row": ["A", "B", "C", "D"], "plate": [1, 1, 1, 1], "data_503": [4, 5, 6, 7]})
df2 = pd.DataFrame({"column": [1, 1, 1, 1], "row": ["A", "B", "C", "D"], "plate": [1, 1, 1, 1], "data_280": [1, 2, 3, 4]})
df3 = pd.DataFrame({"column": [1, 1, 1, 1], "row": ["A", "B", "C", "D"], "plate": [2, 2, 2, 2], "data_503": [4, 5, 6, 7]})
df4 = pd.DataFrame({"column": [1, 1, 1, 1], "row": ["A", "B", "C", "D"], "plate": [2, 2, 2, 2], "data_280": [1, 2, 3, 4]})
layout = pd.DataFrame({"column": [1, 1, 1, 1, 1, 1, 1, 1], "row": ["A", "B", "C", "D", "A", "B", "C", "D"], "cond": [5, 5, 5, 4, 5, 5, 5, 4], "plate": [1, 1, 1, 1, 2, 2, 2, 2]})

out = []
for df in [df1, df2, df3, df4]:
    _ = pd.merge(layout, df, on=['column', 'row', 'plate'], how='outer').dropna()
    out.append(_)

merged = out[0]
for df in out[1:]:
    merged = pd.merge(merged, df, on=['column', 'row', 'plate', 'cond'], how='outer')

merged = merged.fillna(0)

merged['data_280'] = merged['data_280_x'] + merged['data_280_y']
merged['data_503'] = merged['data_503_x'] + merged['data_503_y']

merged = merged.drop(['data_280_x','data_280_y','data_503_x','data_503_y'],1)

给我:

column  cond  plate row  data_280  data_503
0       1     5      1   A       1.0       4.0
1       1     5      1   B       2.0       5.0
2       1     5      1   C       3.0       6.0
3       1     4      1   D       4.0       7.0
4       1     5      2   A       1.0       4.0
5       1     5      2   B       2.0       5.0
6       1     5      2   C       3.0       6.0
7       1     4      2   D       4.0       7.0

答案 1 :(得分:1)

我不确定这是最复杂的解决方案,但您可以先将所有data_503和data_280数据帧连接在一起,然后合并它们。

代码不漂亮,我必须跑去工作:)

df_list = [df1, df2, df3, df4]

data_280_list=[]
for k in df_list:
    if 'data_280' in k.columns:
        data_280_list.append(k)

data_503_list=[]
for k in df_list:
    if 'data_503' in k.columns:
        data_503_list.append(k)


df_503= pd.concat(data_503_list)
df_280= pd.concat(data_280_list)

for df in [df_503, df_280]:
    layout= pd.merge(layout, df, on=['plate', 'row', 'column'], how = 'outer')

答案 2 :(得分:1)

合并条带后缀并应用ffill填充列上的前一列,并通过保留最后一列来删除重复的列,以便完全填充,即

layout.columns  = [i.strip('_x').strip('_y') for i in layout.columns]
layout.sort_index(1).ffill(1).loc[:,~layout.sort_index(1).columns.duplicated(keep='last')]

输出:

   column cond data_280 data_503 plate row
0      1    5        1        1     1   A
1      1    5        2        2     1   B
2      1    5        3        3     1   C
3      1    4        4        4     1   D
4      1    5        1        1     2   A
5      1    5        2        2     2   B
6      1    5        3        3     2   C
7      1    4        4        4     2   D

答案 3 :(得分:0)

使用pd.concat将DataFrame列表合并到一个大型DataFrame中。