马赛克图,百分比和计数值作为pandas DF中的标签

时间:2016-02-05 09:19:14

标签: python pandas plot mosaic

我有像这样的pandas数据框:

     LEVEL_1      LEVEL_2    Freq  Percentage
0       HIGH          HIGH   8842      17.684
1    AVERAGE           LOW   2802       5.604
2        LOW           LOW  22198      44.396
3    AVERAGE       AVERAGE   6804      13.608
4        LOW       AVERAGE   2030       4.060
5       HIGH       AVERAGE   3666       7.332
6    AVERAGE          HIGH   2887       5.774
7        LOW          HIGH    771       1.542

我可以获得LEVEL_1和LEVEL_2:

的图块
 from statsmodels.graphics.mosaicplot import mosaic
 mosaic(df, ['LEVEL_1','LEVEL_2'])

enter image description here
我只想把Freq和Percentage放在每块马赛克图块的中心。 我怎么能这样做?

1 个答案:

答案 0 :(得分:2)

这是一个开始。注意我必须在DataFrame中添加一行零标记。您可以通过lambda函数中的字符串格式使标签更好。您还要重新排序标题。

import pandas as pd
from statsmodels.graphics.mosaicplot import mosaic
import io
d = io.StringIO()
d.write("""     LEVEL_1      LEVEL_2    Freq  Percentage\n
       HIGH          HIGH   8842      17.684\n
    AVERAGE           LOW   2802       5.604\n
        LOW           LOW  22198      44.396\n
    AVERAGE       AVERAGE   6804      13.608\n
        LOW       AVERAGE   2030       4.060\n
       HIGH       AVERAGE   3666       7.332\n
    AVERAGE          HIGH   2887       5.774\n
        LOW          HIGH    771       1.542""")
d.seek(0)
df = pd.read_csv(d, skipinitialspace=True, delim_whitespace=True)
df = df.append({'LEVEL_1': 'HIGH', 'LEVEL_2': 'LOW', 'Freq': 0, 'Percentage': 0}, ignore_index=True)
df = df.sort_values(['LEVEL_1', 'LEVEL_2'])
df = df.set_index(['LEVEL_1', 'LEVEL_2'])
print(df)

mosaic(df['Freq'], labelizer=lambda k: df.loc[k].values);

plot from a Jupyter notebook