我是pandas模块的新手。关于pandas合并方法,我有一个小问题。假设我有两个单独的表,如下所示:
Original_DataFrame
machine weekNum Percent
M1 2 75
M1 5 80
M1 8 95
M1 10 90
New_DataFrame
machine weekNum Percent
M1 1 100
M1 2 100
M1 3 100
M1 4 100
M1 5 100
M1 6 100
M1 7 100
M1 8 100
M1 9 100
M1 10 100
我使用了pandas模块的merge方法,如下:
pd.merge(orig_df, new_df, on='weekNum', how='left')
我得到如下:
machine weekNum Percent_x Percent_y
0 M1 2 75 100
1 M1 5 80 100
2 M1 8 95 100
3 M1 10 90 100
但是,我希望填写跳过的weekNums并为这些行添加100以获得所需的输出,如下所示。
machine weekNum Percent
M1 1 100
M1 2 75
M1 3 100
M1 4 100
M1 5 80
M1 6 100
M1 7 100
M1 8 95
M1 9 100
M1 10 90
有人可以指示我如何继续吗?
答案 0 :(得分:1)
我认为您需要combine_first
,但需要通过常见列首先set_index
:
df11 = df1.set_index(['machine','weekNum'])
df22 = df2.set_index(['machine','weekNum'])
df = df11.combine_first(df22).astype(int).reset_index()
print (df)
machine weekNum Percent
0 M1 1 100
1 M1 2 75
2 M1 3 100
3 M1 4 100
4 M1 5 80
5 M1 6 100
6 M1 7 100
7 M1 8 95
8 M1 9 100
9 M1 10 90
df.plot.bar('weekNum', 'Percent')
编辑:
对于标签:
plt.figure(figsize=(12, 8))
ax = df.plot.bar('weekNum', 'Percent')
rects = ax.patches
for rect, label in zip(rects, df['Percent']):
height = rect.get_height()
ax.text(rect.get_x() + rect.get_width()/2, height + 1, label, ha='center', va='bottom')
plt.ylim(ymax=120)
答案 1 :(得分:0)
不像其他解决方案那样优雅,但无论如何都有效:
# join
merged = pd.merge(data1, data2, on=['machine','weekNum'], how='outer')
# combine percent columns
merged['Percent'] = merged['Percent_x'].fillna(merged['Percent_y'])
# remove extra columns
result = merged[['machine','weekNum', 'Percent']]
结果:
machine weekNum Percent
M1 2 75
M1 5 80
M1 8 95
M1 10 90
M1 1 100
M1 3 100
M1 4 100
M1 6 100
M1 7 100
M1 9 100
答案 2 :(得分:0)
你可以试试这个。根据您的总体目标,这可能不是"程序设计"足够。
mouseout