我需要在pandas中使用变量作为图形的标题。我有一个csv文件,从那里我根据主csv文件中的resource_id创建了多个csv文件以及图形。
我的csv文件中的示例内容:
Access_Stat_ID,Resource_ID,Range_Start,Range_End,Name,Format,Number,Matched_URL
6890859,10020,"2014-05-01 00:00:00","2014-05-31 23:59:59","May 2014","html",89,"/dissertationen/biologie/behrend-anke/HTML/behrend-vita.html"
6890860,10021,"2014-05-01 00:00:00","2014-05-31 23:59:59","May 2014","pdf",30,"/dissertationen/biologie/dreier-lars/PDF/Dreier.pdf"
6890861,10021,"2014-05-01 00:00:00","2014-05-31 23:59:59","May 2014","entry",2,"?"
6890862,10021,"2014-05-01 00:00:00","2014-05-31 23:59:59","May 2014","html",11,"/dissertationen/biologie/dreier-lars/HTML/chapter4.html"
这是我的代码:
df = pd.read_csv('dbo.Access_Stat_all.csv',error_bad_lines=False, usecols=['Range_Start','Format','Resource_ID','Number'])
uniquevalues = np.unique(df[['Resource_ID']].values)
for resource_id in uniquevalues:
df1 = df[df['Resource_ID'] == resource_id]
df1 = df1[['Format', 'Range_Start', 'Number']]
#truncate the date to only take month and year
df1["Range_Start"] = df1["Range_Start"].str[:7]
df1 = df1.groupby(['Format', 'Range_Start'], as_index=True).last()
pd.options.display.float_format = '{:,.0f}'.format
df1 = df1.unstack()
df1.columns = df1.columns.droplevel()
if df1.index.contains('entry'):
df2 = df1[1:4].sum(axis=0)
else:
df2 = df1[0:3].sum(axis=0)
df2.name = 'sum'
df2 = df1.append(df2)
df2.to_csv('csv_files/' + str(resource_id) + '.csv', sep="\t", float_format='%.f')
if df2.index.contains('entry'):
df3 = df2.T[['entry', 'sum']].copy()
else:
df3 = df2.T[['sum']].copy()
# convert index to use pandas datetime format
df3.index = pd.to_datetime(df3.index)
# plot the data
fig, ax = plt.subplots()
plt.xticks(rotation=90)
# use matplotlib date formatters
years = mdates.YearLocator() # every year
yearsFmt = mdates.DateFormatter('%Y-%m')
# format the major ticks
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
ax.plot(df3)
ax.legend(["Seitenzugriffe", "Dateiabrufe"])
plt.tight_layout()
xtl = [item.get_text()[:4] for item in ax.get_xticklabels()]
ax.set_xticklabels(xtl)
fig.savefig('plots/'+ str(resource_id) + '.png')
plt.close('all')
现在在图/图中,我想要特定的resource_id和range_start作为标题。我该怎么做?
答案 0 :(得分:0)
首先,您已在resource_id
循环中定义for
,对吗?因此,在构建绘图时可以将其用作变量:
plt.title(resource_id)
for循环的每次迭代都会产生不同的标题。使用您提供的数据集,在第一次迭代中,resource_id
应该等于10020
,然后10021
,因此将创建/保存两个图。如果您不清楚,请查看Matplotlib tutorial以获取设置标题的更多示例。
其次,对于"Range Start"
,您的数据框是子集,因此它只包含相关的resource_id
,然后遍历每个Range Start
值:
uniquevalues = np.unique(df[['Resource_ID']].values)
for resource_id in uniquevalues:
df1 = df[df['Resource_ID'] == resource_id]
df1 = df1[['Format', 'Range_Start', 'Number']]
#truncate the date to only take month and year
df1["Range_Start"] = df1["Range_Start"].str[:7]
unique_range_starts = np.unique(df["Range Start"].values)
for range_start in unique_range_start:
# all your code to construct the graph goes here....
现在,每个标题标题都有resource_id
和range_start
作为变量:
plt.title(resource_id + range_start)