我有一个包含几列的csv,其中之一是city列。有几个城市,也有一个城市,重复了几次。 我想设置一个条形图,其中以CSV格式显示多少个城市。 示例:
Y X
5 Belo Horizonte
1 Vespasiano
4 São Paulo
我编写了以下代码,但出现错误,该错误在代码之后。
代码:
import matplotlib.pyplot as plt; plt.rcdefaults()
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
#lendo o arquivo
tb_usuarios = 'tb_usuarios.csv'
usuarios = pd.read_csv(tb_usuarios,
header=0,
index_col=False
)
print(usuarios.head())
usuarios["vc_municipio"] = usuarios["vc_municipio"].dropna()
usuarios["vc_municipio"] = usuarios["vc_municipio"].str.upper()
municipio = usuarios.groupby(['vc_municipio'])
print(municipio)
y_pos = usuarios.groupby(['vc_municipio'])['vc_municipio'].count()
print(y_pos)
plt.bar(y_pos, municipio, align='center', alpha=0.5)
plt.xticks(y_pos, municipio)
plt.ylabel('Qtd')
plt.title('Municipio')
plt.show()
错误:
Traceback (most recent call last):
File "C:/Users/Henrique Mendes/PycharmProjects/emprestimo/venv1/emprestimo.py", line 20, in <module>
plt.bar(y_pos, municipio, align='center', alpha=0.5)
File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\pyplot.py", line 2440, in bar
**({"data": data} if data is not None else {}), **kwargs)
File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\__init__.py", line 1601, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\axes\_axes.py", line 2348, in bar
self._process_unit_info(xdata=x, ydata=height, kwargs=kwargs)
File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\axes\_base.py", line 2126, in _process_unit_info
kwargs = _process_single_axis(ydata, self.yaxis, 'yunits', kwargs)
File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\axes\_base.py", line 2108, in _process_single_axis
axis.update_units(data)
File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\axis.py", line 1493, in update_units
default = self.converter.default_units(data, self)
File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\category.py", line 115, in default_units
axis.set_units(UnitData(data))
File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\category.py", line 181, in __init__
self.update(data)
File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\category.py", line 215, in update
for val in OrderedDict.fromkeys(data):
TypeError: unhashable type: 'numpy.ndarray'
我的输出:
"C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\Scripts\python.exe" "C:/Users/Henrique Mendes/PycharmProjects/emprestimo/venv1/emprestimo.py"
pr_usuario bl_administrador dt_nascimento ... dt_cheque es_anexo dt_anexo
0 2 0 24/02/1980 ... NaN NaN NaN
1 3 0 05/09/1985 ... NaN NaN NaN
2 4 1 20/03/1984 ... NaN NaN NaN
3 5 1 20/01/1982 ... NaN NaN NaN
4 6 0 25/05/1985 ... NaN NaN NaN
[5 rows x 30 columns]
{'BELO HORIZONTE': Int64Index([0, 1, 2, 3, 6, 9, 10, 14, 17, 20, 22, 25], dtype='int64'), 'BRASILIA': Int64Index([4], dtype='int64'), 'CONTAGEM': Int64Index([23], dtype='int64'), 'CURITIBA': Int64Index([5, 7, 15, 18, 19], dtype='int64'), 'SANTA LUZIA': Int64Index([21], dtype='int64'), 'VESPASIANO': Int64Index([24], dtype='int64')}
vc_municipio
BELO HORIZONTE 12
BRASILIA 1
CONTAGEM 1
CURITIBA 5
SANTA LUZIA 1
VESPASIANO 1
Name: vc_municipio, dtype: int64
如何制作此图表?
答案 0 :(得分:1)
municipio = usuarios.groupby(['vc_municipio'])
以pandas返回groupby对象,这导致您的错误,因为matplotlib无法处理该错误。
plt.bar
包含x值和y值(请参阅docs)。
matplotlib.pyplot.bar(x,高度,宽度= 0.8,底部=无,*,align ='center',data = None,** kwargs)
幸运的是,当您在熊猫中执行groupby
时,它会自动将x值(或类别)合并为索引。
假设municipio
是一个类别列表(您要按城市计数),则应该可以进行以下操作。
替换您的代码
plt.bar(y_pos, municipio, align='center', alpha=0.5)
与
plt.bar(y_pos.index, y_pos, align='center', alpha=0.5)
或者,您可以使用plt.bar
中的pandas version(扩展了matplot lib)来本地处理某些数据框怪癖。
答案 1 :(得分:1)
pandas
:.csv
中,格式如下:0.0,BELO HORIZONTE
1.0,BELO HORIZONTE
2.0,BELO HORIZONTE
3.0,BELO HORIZONTE
6.0,BELO HORIZONTE
9.0,BELO HORIZONTE
10.0,BELO HORIZONTE
14.0,BELO HORIZONTE
17.0,BELO HORIZONTE
20.0,BELO HORIZONTE
22.0,BELO HORIZONTE
25.0,BELO HORIZONTE
4.0,BRASILIA
23.0,CONTAGEM
5.0,CURITIBA
7.0,CURITIBA
15.0,CURITIBA
18.0,CURITIBA
19.0,CURITIBA
21.0,SANTA LUZIA
24.0,VESPASIANO
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('test.csv', header=None)
df.columns = ['value', 'city']
value city
0 0.0 BELO HORIZONTE
1 1.0 BELO HORIZONTE
2 2.0 BELO HORIZONTE
3 3.0 BELO HORIZONTE
4 6.0 BELO HORIZONTE
5 9.0 BELO HORIZONTE
6 10.0 BELO HORIZONTE
7 14.0 BELO HORIZONTE
8 17.0 BELO HORIZONTE
9 20.0 BELO HORIZONTE
10 22.0 BELO HORIZONTE
11 25.0 BELO HORIZONTE
12 4.0 BRASILIA
13 23.0 CONTAGEM
14 5.0 CURITIBA
15 7.0 CURITIBA
16 15.0 CURITIBA
17 18.0 CURITIBA
18 19.0 CURITIBA
19 21.0 SANTA LUZIA
20 24.0 VESPASIANO
# groupby & count
city_count = df.groupby('city').count()
value
city
BELO HORIZONTE 12
BRASILIA 1
CONTAGEM 1
CURITIBA 5
SANTA LUZIA 1
VESPASIANO 1
# plot
city_count.plot.bar()
plt.ylabel('Qtd')
plt.title('Municipio')
plt.show()
seaborn
作图:import seaborn as sns
sns.barplot(x=city_count.index, y='value', data=city_count)
plt.xticks(rotation=45)
plt.show()