Question

我将一堆统计外贸数据堆积在一个表/ csv中：年份，is_export（否则为其导入），国家/地区，海关编码，宏观代码（一组海关代码）和价值（以美元计）。

我希望能够使用pandas对数据进行分组（而不是使用普通的sql）并获得类似的内容：

macro_group=12

2012  2013 2014
country
export

我是否只需要进行多次groupby次呼叫（在＆＃34;键＆＃34;我想建立一个层次结构）？

编辑：所有行都相同：

id|Country|Year|Export|Macro|Code|Codename|Value
1|China|2012|1|69|6996700|Articles,of iron or steel wire,n.e.s.|0.0
2|Germany|2012|1|69|6996700|Articles,of iron or steel wire,n.e.s.|59.9
3|Italy|2012|1|69|6996700|Articles,of iron or steel wire,n.e.s.|33.2

我想得到的是：

**Macro e.g. 23**
China total export
2012 2013 2014
432  34  3243

China total import
2012 2013 2014
4534 345  4354

Russia total import...

等

Answer 1

您的预期输出并不完全清楚（根据您提供的数据）。我想你想要每个国家和每年的总价值（如果没有，请随时纠正我）：

import pandas as pd

########### Setup some test data: #############
s = """id|Country|Year|Export|Macro|Code|Codename|Value
1|China|2012|1|69|6996700|Articles,of iron or steel wire,n.e.s.|0.0
2|Germany|2012|1|69|6996700|Articles,of iron or steel wire,n.e.s.|59.9
3|Germany|2013|1|69|6996700|Articles,of iron or steel wire,n.e.s.|80.0
4|Germany|2013|1|69|6996700|Articles,of iron or steel wire,n.e.s.|40.0
5|Italy|2012|1|69|6996700|Articles,of iron or steel wire,n.e.s.|33.2"""

from StringIO import StringIO
df = pd.read_csv(StringIO(s), sep='|')

pd.Series.__unicode__ = pd.Series.to_string # suppress meta-data when printing

########### The real stuff happens here: #############
macro = 69
group_by = df[df.Macro == macro].groupby(['Country', 'Year'])['Value'].sum()

for country in df.Country.unique():   
    print '---', country, '---'
    print group_by[country]
    print

这导致以下输出：

--- China ---
2012    0

--- Germany ---
2012     59.9
2013    120.0

--- Italy ---
2012    33.2

熊猫统计平原表

1 个答案: