如何为2个数据帧中的多个变量创建散点图?

时间:2017-02-22 07:24:06

标签: python pandas matplotlib

我有一个数据帧,df1如下所示:

Observed PeakFlow (cfs)      Modelled Peak Flow (cfs)
     9.78768                       10.93963
     1.999368                      2.037152
     11.63652                      8.541796
     3.237471                      3.970588
     54.04929                      22.94427
     4.68197                       3.139319
     16.41346                      12.17337
     14.97399                      7.224458
     2.114172                      5.775542
     22.80021                      22.69659
     25.3347                       13.0805
     33.4092                       11.3452
     13.81051                      7.640867
     6.794793                      4.26161
     9.008561                      6.634675
     5.957804                      4.176471
     2.337406                      2.071208
     32.6419                       4.368421
     3.567871                      2.894737
     5.776844                       3.0387
     39.54993                      5.849845
     4.511765                       2.28483
     6.989101                      3.218266
     14.63979                      9.024768

我还有另一个数据帧,df2如下所示:

        1-1 Match    |  -15% Peak Flow  |   +25% Peak Flow
      ----------------------------------------------------- 
      X-Axis| Y-Axis |  X-Axis| Y-Axis  |   X-Axis| Y-Axis
      -----------------------------------------------------
          0 |  0     |     0  |   0     |      0  |   0
        200 | 200    |    200 |  170    |     200 |  250

我想有这两个数据帧的散点图。期望的输出如下图所示。怎么可能呢?

enter image description here

当我将df2加载为csv时,如下图所示。如何删除未命名的部分并将其作为合并列,如代码所示?

enter image description here

1 个答案:

答案 0 :(得分:1)

您可以使用:

print (df2)
 1-1 Match        -15% Peak Flow        +25% Peak Flow       
     X-Axis Y-Axis         X-Axis Y-Axis         X-Axis Y-Axis
0         0      0              0      0              0      0
1       200    200            200    170            200    250

print (df2.columns)
MultiIndex(levels=[['+25% Peak Flow', '-15% Peak Flow', '1-1 Match'], ['X-Axis', 'Y-Axis']],
           labels=[[2, 2, 1, 1, 0, 0], [0, 1, 0, 1, 0, 1]])

ax = df.plot.scatter(x='Modelled Peak Flow (cfs)', y='Observed PeakFlow (cfs)', s=50)

for i, df3 in df2.groupby(level=0, axis=1):
    df3 = df3.set_index([(i, 'X-Axis')])
    df3.index.name = None
    df3.columns = [i]
#    print (df3)
    df3.plot(ax=ax)

graph

如果需要自定义colorsmarkers

ax = df.plot.scatter(x='Modelled Peak Flow (cfs)', 
                     y='Observed PeakFlow (cfs)', 
                     s=50, 
                     marker='d', 
                     color='r')

df21 = df2.xs('1-1 Match', axis=1).set_index('X-Axis')
df21.index.name = None
df21.columns = ['1-1 Match']
df21.plot(c='black', ax=ax)

df22 = df2.xs('-15% Peak Flow', axis=1).set_index('X-Axis')
df22.index.name = None
df22.columns = ['-15% Peak Flow']
df22.plot(c='blue',ls='--', ax=ax)

df23 = df2.xs('+25% Peak Flow', axis=1).set_index('X-Axis')
df23.index.name = None
df23.columns = ['+25% Peak Flow']
df23.plot(c='blue',ls='--', ax=ax)

graphs

EDIT1:

MultiIndex有问题,所以需要:

df2 = df2.read_csv('file', header=[0,1])

print (df2)
  1-1 Match Unnamed: 1_level_0 -15% Peak Flow Unnamed: 3_level_0  \
     X-Axis             Y-Axis         X-Axis             Y-Axis   
0         0                  0              0                  0   
1       200                200            200                170   

  +25% Peak Flow Unnamed: 5_level_0  
          X-Axis             Y-Axis  
0              0                  0  
1            200                250 
cols = df2.columns.get_level_values(0)
cols = cols.where(~cols.str.contains('Unnamed')).to_series().ffill().tolist()
df2.columns = [cols, df2.columns.get_level_values(1)]
df2 = df2.sort_index(level=0, axis=1)
print (df2)
  +25% Peak Flow        -15% Peak Flow        1-1 Match       
          X-Axis Y-Axis         X-Axis Y-Axis    X-Axis Y-Axis
0              0      0              0      0         0      0
1            200    250            200    170       200    200

print (df2.columns)
MultiIndex(levels=[['+25% Peak Flow', '-15% Peak Flow', '1-1 Match'], 
                   ['X-Axis', 'Y-Axis']],
           labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])