我有一个按日期组织的熊猫数据框我试图按年分割(在一个名为'year'的列中)。我想每年返回一个数据帧,名称类似于“df19XX”。
我希望写一个可以处理这个问题的“For”循环......就像...
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<!-- Bootstrap CSS -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-alpha.2/css/bootstrap.min.css" integrity="sha384-y3tfxAZXuh4HwSYylfB+J125MxIs6mR5FOHamPBG064zB+AFeWH94NdvaCBm8qnd" crossorigin="anonymous">
<script src="https://code.jquery.com/jquery-3.2.1.slim.min.js" integrity="sha384-KJ3o2DKtIkvYIK3UENzmM7KCkRr/rE9/Qpg6aAZGJwFDMVNA/GpGFF93hXpG5KkN" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.12.3/umd/popper.min.js" integrity="sha384-vFJXuSJphROIrBnz7yo7oB41mKfc8JzQZiCq4NCceLEaO4IHwicKwpJf9c9IpFgh" crossorigin="anonymous"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-beta.2/js/bootstrap.min.js" integrity="sha384-alpBpkh1PFOepccYVYDB4do5UnbKysX5WZXm3XxPqe5iKTfUKjNkCk9SaVuEZflJ" crossorigin="anonymous"></script>
...将返回三个名为df1980,df1981和df1982的数据帧。
谢谢!
答案 0 :(得分:2)
您可以遍历groupby:
In [11]: df = pd.DataFrame({"date": pd.date_range("2012-12-28", "2013-01-03"), "A": np.random.rand(7)})
In [12]: df
Out[12]:
A date
0 0.434715 2012-12-28
1 0.208877 2012-12-29
2 0.912897 2012-12-30
3 0.226368 2012-12-31
4 0.100489 2013-01-01
5 0.474088 2013-01-02
6 0.348368 2013-01-03
In [13]: g = df.groupby(df.date.dt.year)
In [14]: for k, v in g:
...: print(k)
...: print(v)
...: print()
...:
2012
A date
0 0.434715 2012-12-28
1 0.208877 2012-12-29
2 0.912897 2012-12-30
3 0.226368 2012-12-31
2013
A date
4 0.100489 2013-01-01
5 0.474088 2013-01-02
6 0.348368 2013-01-03
我会强烈认为这比仅仅有一个带有变量的字典和使用locals()
字典(我声称使用locals()
所说的更好,所以不是“pythonic” ):
In [14]: {k: grp for k, grp in g}
Out[14]:
{2012: A date
0 0.434715 2012-12-28
1 0.208877 2012-12-29
2 0.912897 2012-12-30
3 0.226368 2012-12-31, 2013: A date
4 0.100489 2013-01-01
5 0.474088 2013-01-02
6 0.348368 2013-01-03}
虽然您可能会考虑动态计算(而不是存储在字典或变量中)。您可以使用get_group
:
In [15]: g.get_group(2012)
Out[15]:
A date
0 0.865239 2012-12-28
1 0.019071 2012-12-29
2 0.362088 2012-12-30
3 0.031861 2012-12-31
答案 1 :(得分:2)
这样的东西?也使用@Andy的df
variables = locals()
for i in [2012, 2013]:
variables["df{0}".format(i)]=df.loc[df.date.dt.year==i]
df2012
Out[118]:
A date
0 0.881468 2012-12-28
1 0.237672 2012-12-29
2 0.992287 2012-12-30
3 0.194288 2012-12-31
df2013
Out[119]:
A date
4 0.151854 2013-01-01
5 0.855312 2013-01-02
6 0.534075 2013-01-03