原始数据框
Country Gender Arr-Dep Year Value
0 Austria Male IN 1974 13728
1 Austria Male OUT 1974 17977
2 Austria Female IN 1974 8541
3 Austria Female OUT 1974 8450
4 Austria Total IN 1974 22269
5 Austria Total OUT 1974 26427
6 Belgium Male IN 1974 2412
7 Belgium Male OUT 1974 2800
8 Belgium Female IN 1974 2105
9 Belgium Female OUT 1974 2100
10 Belgium Total IN 1974 4517
开始我的代码中 ,我正在使用以下库(在具有离线绘图图的Jupyter笔记本中):
import pandas as pd
import numpy as np
import plotly as py
import plotly.figure_factory as ff
import plotly.graph_objs as go
from IPython import display
import os
py.offline.init_notebook_mode()
然后 ,为了避免出现任何错误,我将'-'值替换为0,并按所需的列(年份)进行分组:< / p>
#Replace non numerical values from the Value column
df1['Value'] = df1['Value'].replace('-', np.nan)
#Groupby Country
df1 = df1.groupby(['Year'], as_index=False)['Value'].sum()
然后 ,我使用绘图创建图形:
#Plot everything in a graph
py.offline.iplot({
"data": [go.Line(x=df1.Year,
y=df1.Value)],
"layout": go.Layout(title="Immigration through the years")
})
我的问题是...为了过滤/替换值或groupby
,可以更改创建图形的最后一位吗?然后,在创建图形之前,我可以摆脱2个步骤。
答案 0 :(得分:1)
您的方法似乎已经是正确且清洁的方法!
涉及replace
和groupBy
的两行是数据准备步骤。最后一步是可视化(或数据表示)步骤。将它们分开可以使您的代码更具可读性!
此外,涉及replace
和groupBy
的两行不能合并,因为它涉及到修改一行并在另一行上进行聚合。