假设我有这样的数据帧(在循环内生成并添加到列表中):
column row data_503 plate
0 1 A 1 2
1 1 B 2 2
2 1 C 3 2
3 1 D 4 2
column row data_280 plate
0 1 A 1 2
1 1 B 2 2
2 1 C 3 2
3 1 D 4 2
column row data_503 plate
0 1 A 1 1
1 1 B 2 1
2 1 C 3 1
3 1 D 4 1
column row data_280 plate
0 1 A 1 1
1 1 B 2 1
2 1 C 3 1
3 1 D 4 1
我有一个布局文件链接将测量结果映射到特定条件:
column row cond plate
0 1 A 5 1
1 1 B 5 1
2 1 C 5 1
3 1 D 4 1
0 1 A 5 2
1 1 B 5 2
2 1 C 5 2
3 1 D 4 2
我可以组合数据框,如:
for df in df_list:
layout= pd.merge(layout, df, on=['plate', 'row', 'column'], how = 'outer')
但是,我始终会收到data_280_x
和data_280_y
列,但我只想获得data_280
和data_503
列。将outer
更改为left
不会改变任何内容。
我有什么想法可以获得类似的东西吗?:
column row cond plate data_280 data_503
0 1 A 5 1 1 1
1 1 B 5 1 2 2
2 1 C 5 1 3 3
3 1 D 4 1 4 4
0 1 A 5 2 1 1
1 1 B 5 2 2 2
2 1 C 5 2 3 3
3 1 D 4 2 4 4
答案 0 :(得分:2)
您可以合并_x
和_y
列,因为它们不会有任何重叠值(基于该布局df),如下所示:
df['data_208'] = df['data_208_x'] + df['data_208_y']
然后您可以删除_x
和_y
列。
使用示例进行更新:
df1 = pd.DataFrame({"column": [1, 1, 1, 1], "row": ["A", "B", "C", "D"], "plate": [1, 1, 1, 1], "data_503": [4, 5, 6, 7]})
df2 = pd.DataFrame({"column": [1, 1, 1, 1], "row": ["A", "B", "C", "D"], "plate": [1, 1, 1, 1], "data_280": [1, 2, 3, 4]})
df3 = pd.DataFrame({"column": [1, 1, 1, 1], "row": ["A", "B", "C", "D"], "plate": [2, 2, 2, 2], "data_503": [4, 5, 6, 7]})
df4 = pd.DataFrame({"column": [1, 1, 1, 1], "row": ["A", "B", "C", "D"], "plate": [2, 2, 2, 2], "data_280": [1, 2, 3, 4]})
layout = pd.DataFrame({"column": [1, 1, 1, 1, 1, 1, 1, 1], "row": ["A", "B", "C", "D", "A", "B", "C", "D"], "cond": [5, 5, 5, 4, 5, 5, 5, 4], "plate": [1, 1, 1, 1, 2, 2, 2, 2]})
out = []
for df in [df1, df2, df3, df4]:
_ = pd.merge(layout, df, on=['column', 'row', 'plate'], how='outer').dropna()
out.append(_)
merged = out[0]
for df in out[1:]:
merged = pd.merge(merged, df, on=['column', 'row', 'plate', 'cond'], how='outer')
merged = merged.fillna(0)
merged['data_280'] = merged['data_280_x'] + merged['data_280_y']
merged['data_503'] = merged['data_503_x'] + merged['data_503_y']
merged = merged.drop(['data_280_x','data_280_y','data_503_x','data_503_y'],1)
给我:
column cond plate row data_280 data_503
0 1 5 1 A 1.0 4.0
1 1 5 1 B 2.0 5.0
2 1 5 1 C 3.0 6.0
3 1 4 1 D 4.0 7.0
4 1 5 2 A 1.0 4.0
5 1 5 2 B 2.0 5.0
6 1 5 2 C 3.0 6.0
7 1 4 2 D 4.0 7.0
答案 1 :(得分:1)
我不确定这是最复杂的解决方案,但您可以先将所有data_503和data_280数据帧连接在一起,然后合并它们。
代码不漂亮,我必须跑去工作:)
df_list = [df1, df2, df3, df4]
data_280_list=[]
for k in df_list:
if 'data_280' in k.columns:
data_280_list.append(k)
data_503_list=[]
for k in df_list:
if 'data_503' in k.columns:
data_503_list.append(k)
df_503= pd.concat(data_503_list)
df_280= pd.concat(data_280_list)
for df in [df_503, df_280]:
layout= pd.merge(layout, df, on=['plate', 'row', 'column'], how = 'outer')
答案 2 :(得分:1)
合并条带后缀并应用ffill
填充列上的前一列,并通过保留最后一列来删除重复的列,以便完全填充,即
layout.columns = [i.strip('_x').strip('_y') for i in layout.columns]
layout.sort_index(1).ffill(1).loc[:,~layout.sort_index(1).columns.duplicated(keep='last')]
输出:
column cond data_280 data_503 plate row 0 1 5 1 1 1 A 1 1 5 2 2 1 B 2 1 5 3 3 1 C 3 1 4 4 4 1 D 4 1 5 1 1 2 A 5 1 5 2 2 2 B 6 1 5 3 3 2 C 7 1 4 4 4 2 D
答案 3 :(得分:0)
使用pd.concat
将DataFrame列表合并到一个大型DataFrame中。