我正在尝试绘制年值的散点图,并用一条线显示每年x的平均值。
(我将那条线画到了图表上,并且底部的xticks应该按“季节”的顺序递增。)
我陷入了第二个情节:到达直线时,出现“元组索引超出范围”错误
ax2.plot(x2, y2, color='r')
我不确定我是否能正确地解决这个问题,但是我的主要数据框包含我所有的值,然后我为每个季节/年份组合的平均值创建了一个groupby系列。然后我无法绘制那个,因此我将其转换为数据框并对其重新索引,以期有所帮助。没有。不知道从这里去哪里。
当我创建Pandas分类对象时,问题开始,但这是我想到的正确排序数据的唯一方法。也许是问题所在,但我不确定如何将其按和排序以正确完成标签。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
file = r"C:\myfile.xlsx"
df = pd.read_excel(file)
season = ["Spring 2008", "Summer 2008", "Fall 2008",
"Spring 2009", "Summer 2009", "Fall 2009",
"Spring 2010", "Summer 2010", "Fall 2010",
"Spring 2011", "Summer 2011", "Fall 2011",
"Spring 2012", "Summer 2012", "Fall 2012",
"Spring 2013", "Summer 2013", "Fall 2013",
"Spring 2014", "Summer 2014", "Fall 2014",
"Spring 2015", "Summer 2015", "Fall 2015",
"Spring 2016", "Summer 2016", "Fall 2016",
"Spring 2017", "Summer 2017", "Fall 2017",
"Spring 2018", "Summer 2018", "Fall 2018",
"Spring 2019"]
df = df.loc[df['Total'] > 100]
df['Season_Year'] = df.apply(lambda row: row.Semester + " " + str(row.Year), axis=1)
df['Season_Year'] = pd.Categorical(df['Season_Year'], season)
df.sort_values(by='Season_Year', inplace=True, ascending=True)
df = df.dropna()
df['Score'] = df.apply(lambda row: row.Respondents / row.Total, axis=1)
grouped = df.groupby('Season_Year')['Score'].mean()
grouped = grouped.dropna()
df2 = grouped.to_frame()
df2 = df2.reset_index()
df2.head()
x = df['Season_Year']
y = df['Score']
x2 = df2['Season_Year']
y2 = df2['Score']
fig, ax = plt.subplots()
ax.scatter(x, y, marker='o', color='black')
ax2 = ax.twinx()
ax2.plot(x2, y2, color='r')
ax.set_ylim(0, 1.1)
ax2.set_ylim(0, 1.1)
ax.set_xticklabels(season, rotation='vertical')
plt.show()
答案 0 :(得分:1)
您可以(几乎)直接在一行中绘制它们的图形,如下所示:
ax2 = ax.twinx()
ax2.plot( list(x2.values), list(y2.values), color='r')
或者,您可以将值显式提取到列表中,如下所示:
x2 = [ x2[n] for n in range( x2.shape[0]) ]
y2 = [ y2[n] for n in range( y2.shape[0]) ]
然后像在示例中一样绘制它们的图形,
ax2 = ax.twinx()
ax2.plot(x2, y2, color='r')