在pandas中使用groupby DataFrame的子图?

时间:2016-01-10 22:19:31

标签: python csv pandas matplotlib plot

我有类似以下CSV的内容:

date,name,area,score
10/15/2015,john,metallurgy,92
10/16/2015,john,metallurgy,84
10/16/2015,nancy,metallurgy,97
10/17/2015,nancy,metallurgy,76
10/18/2015,john,forestry,81
10/18/2015,john,forestry,46
10/19/2015,nancy,forestry,81
10/19/2015,nancy,forestry,74
10/23/2015,nancy,forestry,83

我希望每个人(name)都有一个包含子图的情节。我希望它看起来像这样:

plots show here

此外,如图所示,以及绘制实际得分点,我希望能够绘制指数加权移动平均值(ewma)曲线或一系列线性回归线等。

我想我可以在Python / vanilla matplotlib中用以下内容完成:

import pandas as pd 
import matplotlib.pyplot as plt

plots_per_row = 3  # number of columns
df = pd.read_csv("data.csv")

# get number of plots
nplots = 0
by_name_and_area = {}
nplots_by_name = {}
nrows_by_name = {}

for name, namerows in df.groupby(['name']):
    by_name_and_area[name] = {}
    nplots_by_name[name] = 0

    for area, rows in namerows.groupby(['area']):
        by_name_and_area[name][area] = rows['score']
        nplots_by_name[name] += 1

    # decide number of rows
    nrows_by_name[name] = int(nplots / plots_per_row)
    if nplots % plots_per_row > 0:
        nrows_by_name[name] += 1

    # create figure & subplots and iterate through to plot each
    fig, axes = plt.subplots(nrows, plots_per_row, sharex=True, sharey=True,  squeeze=True)
    # ... etc, etc

但我宁愿在pandas中这样做,因为我试图了解这一点,而且在我们大多数人都拥有之前,Wes通常会考虑所有事情。

有什么想法吗?

0 个答案:

没有答案