我有类似以下CSV的内容:
date,name,area,score
10/15/2015,john,metallurgy,92
10/16/2015,john,metallurgy,84
10/16/2015,nancy,metallurgy,97
10/17/2015,nancy,metallurgy,76
10/18/2015,john,forestry,81
10/18/2015,john,forestry,46
10/19/2015,nancy,forestry,81
10/19/2015,nancy,forestry,74
10/23/2015,nancy,forestry,83
我希望每个人(name
)都有一个包含子图的情节。我希望它看起来像这样:
此外,如图所示,以及绘制实际得分点,我希望能够绘制指数加权移动平均值(ewma)曲线或一系列线性回归线等。
我想我可以在Python / vanilla matplotlib中用以下内容完成:
import pandas as pd
import matplotlib.pyplot as plt
plots_per_row = 3 # number of columns
df = pd.read_csv("data.csv")
# get number of plots
nplots = 0
by_name_and_area = {}
nplots_by_name = {}
nrows_by_name = {}
for name, namerows in df.groupby(['name']):
by_name_and_area[name] = {}
nplots_by_name[name] = 0
for area, rows in namerows.groupby(['area']):
by_name_and_area[name][area] = rows['score']
nplots_by_name[name] += 1
# decide number of rows
nrows_by_name[name] = int(nplots / plots_per_row)
if nplots % plots_per_row > 0:
nrows_by_name[name] += 1
# create figure & subplots and iterate through to plot each
fig, axes = plt.subplots(nrows, plots_per_row, sharex=True, sharey=True, squeeze=True)
# ... etc, etc
但我宁愿在pandas
中这样做,因为我试图了解这一点,而且在我们大多数人都拥有之前,Wes通常会考虑所有事情。
有什么想法吗?