Question

我有以下数据：

    url='https://raw.githubusercontent.com/108michael/ms_thesis/master/clean_gdp_data_all.csv'

c=pd.read_csv(url, index_col=0)
c = c.loc[(c.GeoName == 'California') & \
           (c.ComponentName == 'Real GDP by state')]
c.head(3)


    GeoName     ComponentName   IndustryClassification  Description     2004    2005    2006    2007    2008    2009    2010    2011    2012    2013    2014
38281   California  Real GDP by state   111-112     Farms   15717   18751   18215   15335   14109   18798   19197   16535   15014   16909   0
38282   California  Real GDP by state   113-115     Forestry, fishing, and related activities   6234    6278    7845    7786    7365    7390    7831    8115    8995    9312    0
38284   California  Real GDP by state   211     Oil and gas extraction  7769    8107    10693   12342   12010   17155   14575   15289   18849   16165   0

我想用for循环运行以下代码，除了我想每年运行它（2004-2014）然后将它们合并在一起，如最后一行代码所示：

    d = c.sort_values('2004', ascending=False).head(10)[['GeoName', \
'IndustryClassification', 'Description', 'ComponentName', '2004' ]]


e = c.sort_values('2005', ascending=False).head(10)[['GeoName', \
'IndustryClassification', 'Description', 'ComponentName', '2005' ]]

crgdp = pd.merge(d,e, how='inner', on=['GeoName', \
'IndustryClassification', 'Description', 'ComponentName'])

Answer 1

在这里，它将帮助你挺身而出：

import pandas as pd

url='https://raw.githubusercontent.com/108michael/ms_thesis/master/clean_gdp_data_all.csv'
c=pd.read_csv(url, index_col=0)
d = c.loc[(c.GeoName == 'California') & (c.ComponentName == 'Real GDP by state')]
for y1, y2 in zip(c.columns[4:], c.columns[5:]):
    d1 = d.sort_values(y1, ascending=False).head(10)[['GeoName','IndustryClassification', 'Description', 'ComponentName', y1 ]]
    e1 = d.sort_values(y2, ascending=False).head(10)[['GeoName','IndustryClassification', 'Description', 'ComponentName', y2 ]]
    crgdp = pd.merge(d1,e1, how='inner', on=['GeoName','IndustryClassification', 'Description', 'ComponentName'])
    crgdp.to_csv('{0}-{1}.csv'.format(y1,y2), index=False)

Answer 2

我认为您不能以您想要的方式执行此操作，因为一行中的所有值都是“已连接”且属于该行。因此，您可以按一列对DF进行排序，这将对具有所有相应值的所有行重新排序，但下次当您对另一列进行排序时 - 您将在第一列中丢失排序顺序，依此类推......

在以下示例中查看索引值和GameObject bottomWall = GameObject.Find("Walls/Bottom");和a列中的值：

注意：无关紧要我们如何对数据进行排序，每行中的所有值都相互“绑定”到它们的索引。

因此，您可以按In [16]: df Out[16]: a b c 0 0 7 1 1 6 6 0 2 7 4 5 In [17]: df.sort_values(by='a', ascending=False) Out[17]: a b c 2 7 4 5 1 6 6 0 0 0 7 1 In [18]: df.sort_values(by='b', ascending=False) Out[18]: a b c 0 0 7 1 1 6 6 0 2 7 4 5 In [19]: df.sort_values(by=['a','b'], ascending=False) Out[19]: a b c 2 7 4 5 1 6 6 0 0 0 7 1或a或b对DF进行排序，但在这种情况下，您的['a','b']列不单调递减。

查看您的数据 - 如果您要通过“合并”列对数据进行分组并检查重复数据，您会看到没有任何数据：

它显示每个组只有一行。因此，在合并之后，所有行都将保持不变，因为您可以将In [132]: c.groupby(['GeoName', 'IndustryClassification', 'Description', 'ComponentName']).size().nlargest(3) Out[132]: GeoName IndustryClassification Description ComponentName California ... Federal civilian Real GDP by state 1 Federal military Real GDP by state 1 State and local Real GDP by state 1 dtype: int64列视为主键（即唯一标识符）。

以下是一个例子：

['GeoName', 'IndustryClassification', 'Description', 'ComponentName']

pandas：使用for循环执行多个命令

2 个答案: