我有一个数据帧,df1如下所示:
Observed PeakFlow (cfs) Modelled Peak Flow (cfs)
9.78768 10.93963
1.999368 2.037152
11.63652 8.541796
3.237471 3.970588
54.04929 22.94427
4.68197 3.139319
16.41346 12.17337
14.97399 7.224458
2.114172 5.775542
22.80021 22.69659
25.3347 13.0805
33.4092 11.3452
13.81051 7.640867
6.794793 4.26161
9.008561 6.634675
5.957804 4.176471
2.337406 2.071208
32.6419 4.368421
3.567871 2.894737
5.776844 3.0387
39.54993 5.849845
4.511765 2.28483
6.989101 3.218266
14.63979 9.024768
我还有另一个数据帧,df2如下所示:
1-1 Match | -15% Peak Flow | +25% Peak Flow
-----------------------------------------------------
X-Axis| Y-Axis | X-Axis| Y-Axis | X-Axis| Y-Axis
-----------------------------------------------------
0 | 0 | 0 | 0 | 0 | 0
200 | 200 | 200 | 170 | 200 | 250
我想有这两个数据帧的散点图。期望的输出如下图所示。怎么可能呢?
当我将df2加载为csv时,如下图所示。如何删除未命名的部分并将其作为合并列,如代码所示?
答案 0 :(得分:1)
您可以使用:
print (df2)
1-1 Match -15% Peak Flow +25% Peak Flow
X-Axis Y-Axis X-Axis Y-Axis X-Axis Y-Axis
0 0 0 0 0 0 0
1 200 200 200 170 200 250
print (df2.columns)
MultiIndex(levels=[['+25% Peak Flow', '-15% Peak Flow', '1-1 Match'], ['X-Axis', 'Y-Axis']],
labels=[[2, 2, 1, 1, 0, 0], [0, 1, 0, 1, 0, 1]])
ax = df.plot.scatter(x='Modelled Peak Flow (cfs)', y='Observed PeakFlow (cfs)', s=50)
for i, df3 in df2.groupby(level=0, axis=1):
df3 = df3.set_index([(i, 'X-Axis')])
df3.index.name = None
df3.columns = [i]
# print (df3)
df3.plot(ax=ax)
如果需要自定义colors
和markers
:
ax = df.plot.scatter(x='Modelled Peak Flow (cfs)',
y='Observed PeakFlow (cfs)',
s=50,
marker='d',
color='r')
df21 = df2.xs('1-1 Match', axis=1).set_index('X-Axis')
df21.index.name = None
df21.columns = ['1-1 Match']
df21.plot(c='black', ax=ax)
df22 = df2.xs('-15% Peak Flow', axis=1).set_index('X-Axis')
df22.index.name = None
df22.columns = ['-15% Peak Flow']
df22.plot(c='blue',ls='--', ax=ax)
df23 = df2.xs('+25% Peak Flow', axis=1).set_index('X-Axis')
df23.index.name = None
df23.columns = ['+25% Peak Flow']
df23.plot(c='blue',ls='--', ax=ax)
EDIT1:
MultiIndex
有问题,所以需要:
df2 = df2.read_csv('file', header=[0,1])
print (df2)
1-1 Match Unnamed: 1_level_0 -15% Peak Flow Unnamed: 3_level_0 \
X-Axis Y-Axis X-Axis Y-Axis
0 0 0 0 0
1 200 200 200 170
+25% Peak Flow Unnamed: 5_level_0
X-Axis Y-Axis
0 0 0
1 200 250
cols = df2.columns.get_level_values(0)
cols = cols.where(~cols.str.contains('Unnamed')).to_series().ffill().tolist()
df2.columns = [cols, df2.columns.get_level_values(1)]
df2 = df2.sort_index(level=0, axis=1)
print (df2)
+25% Peak Flow -15% Peak Flow 1-1 Match
X-Axis Y-Axis X-Axis Y-Axis X-Axis Y-Axis
0 0 0 0 0 0 0
1 200 250 200 170 200 200
print (df2.columns)
MultiIndex(levels=[['+25% Peak Flow', '-15% Peak Flow', '1-1 Match'],
['X-Axis', 'Y-Axis']],
labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])