完善Python代码以供使用(跳过重复的步骤)

时间:2019-04-14 14:10:31

标签: python pandas numpy

我一直在研究一个需要拆分10-12列并将其堆叠的项目。唯一的问题是我必须反复进行。我的意思是,一旦我拆分了1列,就将其堆叠起来,然后对其他列重复相同的步骤。 尽管我在运行代码方面没有问题,但我正在寻找一种更高效的方法。

我目前正在重复相同的过程10-12次,并且运行代码需要一些时间,因为有50多个列名。

df1 = (df1.set_index(['Announced Date', 'Completed Date', 'Target Company',
                      'Target Dominant Sector', 'Target Dominant Country', 'Target State',
                      'Target Financial Advisor', 'Target Legal Advisor', 'Target Broker', 
                      'Target Accountant', 'Target PR', 'Target Consultant',
                      'Bidder Company', 'Bidder Dominant Country', 'Bidder State',
                      'Bidder Financial Advisor', 'Bidder Legal Advisor', 'Bidder Broker', 
                      'Bidder Accountant', 'Bidder PR', 'Bidder Consultant', 
                      'Seller Company', 'Seller Dominant Country', 'Seller State', 
                      'Seller Financial Advisor', 'Seller Legal Advisor', 'Seller Broker', 
                      'Seller Accountant', 'Seller PR', 'Seller Consultant',
                      'Reported Revenue Multiple Y1', 'Reported EBIT Multiple Y1', 'Reported EBITDA Multiple Y1', 
                      'Reported PE Multiple Y1', 'Reported Book Value Multiple Y1', 'Deal Description', 
                      'Deal Type', 'Deal Nature', "Deal Value USD(m)", 
                      'Deal ID', 'Deal within regular criteria','Target companies', 
                      'Target FAs', 'Taget LAs', "Taget Brokers", 
                      "Target Accountants", 'Target PRs','Target Consultants',
                      'Bidder Companies', 'Bidder FAs', 'Bidder LAs', 
                      "Bidder Brokers", "Bidder Accountants","Bidder PRs",
                      "Bidder Consultants",'Seller Companies']).stack()
        .reset_index(level=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,
                          29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55], name='Seller FAs')
        .reset_index(drop=True))

我知道不用输入所有列名,而是可以使用

df1.columns

我可以使用

,而不是分别使用0-55个数字

np.arange(56)

但是我无法将它们合并到代码中。有人可以帮我提高效率吗?

1 个答案:

答案 0 :(得分:0)

您可以使用:

df1 = (df1.set_index(df1.columns.tolist())
          .stack()
          .reset_index(level=np.arange(56))
          .reset_index(drop=True))

但也许DataFrame.melt在这里应该更好:

df1 = pd.DataFrame({
         'A':[4,5,4],
         'B':[7,2,3],
         'C':[1,3,1],
})

print (df1)
   A  B  C
0  4  7  1
1  5  2  3
2  4  3  1

df1 = df1.rename_axis('a').reset_index().melt('a',var_name='b', value_name='c')
print (df1)
   a  b  c
0  0  A  4
1  1  A  5
2  2  A  4
3  0  B  7
4  1  B  2
5  2  B  3
6  0  C  1
7  1  C  3
8  2  C  1

如有必要排序:

df2 = df1.sort_values(['a','b'])
print (df2)
   a  b  c
0  0  A  4
3  0  B  7
6  0  C  1
1  1  A  5
4  1  B  2
7  1  C  3
2  2  A  4
5  2  B  3
8  2  C  1