I have been able to use pandas groupby
to create a new DataFrame
but I'm getting an error when I create a barplot
.
The groupby command:
invYr = invoices.groupby(['FinYear']).sum()[['Amount']]
Which creates a new DataFrame
that looks correct to me.
Running:
sns.barplot(x='FinYear', y='Amount', data=invYr)
I get the error:
ValueError: Could not interperet input 'FinYear'
It appears that the issue is related to the index, being FinYear but unfortunately I have not been able to solve the issue even when using reindex
.
答案 0 :(得分:13)
import pandas as pd
import seaborn as sns
invoices = pd.DataFrame({'FinYear': [2015, 2015, 2014], 'Amount': [10, 10, 15]})
invYr = invoices.groupby(['FinYear']).sum()[['Amount']]
>>> invYr
Amount
FinYear
2014 15
2015 20
The reason that you are getting the error is that when you created invYr
by grouping invoices
, the FinYear
column becomes the index and is no longer a column. There are a few solutions:
1) One solution is to specify the source data directly. You need to specify the correct datasource for the chart. If you do not specify a data
parameter, Seaborn does not know which dataframe/series has the columns 'FinYear' or 'Amount' as these are just text values. You must specify, for example, y=invYr.Amount
to specify both the dataframe/series and the column you'd like to graph. The trick here is directly accessing the index of the dataframe.
sns.barplot(x=invYr.index, y=invYr.Amount)
2) Alternatively, you can specify the data source and then directly refer to its columns. Note that the grouped data frame had its index reset so that the column again becomes available.
sns.barplot(x='FinYear', y='Amount', data=invYr.reset_index())
3) A third solution is to specify as_index=False
when you perform the groupby
, making the column available in the grouped dataframe.
invYr = invoices.groupby('FinYear', as_index=False).Amount.sum()
sns.barplot(x='FinYear', y='Amount', data=invYr)
All solutions above produce the same plot below.