遍历列以切片数据集

时间:2018-07-06 02:31:44

标签: python pandas loops slice

我有以下数据集: 名为:2,3,4 ... 9的列填充有彼此重叠的主题名称。网页浏览量是结果变量。

        2                           3                       Pageviews
0       Financial Services          Consumer Products       4106.0
1       Consumer Products           ...                     3368.0
2       Consumer Products           ...                     1025.0
3       Collaboration               ...                     7840.0
4       Future of Supply Chains     ...                     2076.0

我想将每个主题列(2,3,4,...)与Pageviews一起切片并附加它们,以便仅创建一个带有1个主题列和Pageviews的数据框。

我习惯于在Stata中循环,您可以使用x在列名中循环,但是我知道与Pyhton完全不同。

我从

开始
for x in range(2, 9):
    df_x = df[['Pageviews',  df.x]]

但是Python无法识别df.x

如何遍历列名?可以使用迭代器来创建新的数据帧吗?

谢谢!

编辑

我想要的输出是

                                       Col        Pageviews
0                           Financial Services      4106.0
1                            Consumer Products      3368.0
2                            Consumer Products      1025.0
3                                 Collaboration     7840.0
4                      Future of Supply Chains      2076.0
5                          Future of Reporting      2123.0
6                    Sustainability Management     15576.0
7                                 Human Rights        52.0
8                                      BSR News      903.0
9                       Energy and Extractives      1232.0
10                                  HERproject       616.0
11                   Sustainability Management     10697.0

其中col是附加第2、3、4 ...列的结果,而Pageviews是附加相应的Pageviews列的结果。

2 个答案:

答案 0 :(得分:1)

使用melt

df.melt('Pageviews').drop('variable',1)
Out[644]: 
    Pageviews                 value
0        1210      ConsumerProducts
1        1528         Collaboration
2        1716     FinancialServices
3        1403         Collaboration
4        1090      ConsumerProducts
5        1210      ConsumerProducts
6        1528  FutureofSupplyChains
7        1716      ConsumerProducts
8        1403     FinancialServices
9        1090  FutureofSupplyChains
10       1210     FinancialServices
11       1528     FinancialServices
12       1716         Collaboration
13       1403  FutureofSupplyChains
14       1090     FinancialServices

答案 1 :(得分:0)

我认为您正在寻找某种stack方法而不是迭代方法(通常,在使用数据框时,迭代法是最后的选择,因为通常有矢量化方法可以完成大多数数据重组任务)。

以示例数据框为例:

>>> df
                    0                        1                        2  \
0   Consumer Products        Consumer Products       Financial Services   
1       Collaboration  Future of Supply Chains       Financial Services   
2  Financial Services        Consumer Products            Collaboration   
3       Collaboration       Financial Services  Future of Supply Chains   
4   Consumer Products  Future of Supply Chains       Financial Services   

   Pageviews  
0       1210  
1       1528  
2       1716  
3       1403  
4       1090  

您可以执行以下操作:

new_df = (df.set_index('Pageviews')
          .stack()
          .reset_index(0))

>>> new_df
    Pageviews                        0
0        1210        Consumer Products
1        1210        Consumer Products
2        1210       Financial Services
3        1528            Collaboration
4        1528  Future of Supply Chains
5        1528       Financial Services
6        1716       Financial Services
7        1716        Consumer Products
8        1716            Collaboration
9        1403            Collaboration
10       1403       Financial Services
11       1403  Future of Supply Chains
12       1090        Consumer Products
13       1090  Future of Supply Chains
14       1090       Financial Services