我有以下数据集: 名为:2,3,4 ... 9的列填充有彼此重叠的主题名称。网页浏览量是结果变量。
2 3 Pageviews
0 Financial Services Consumer Products 4106.0
1 Consumer Products ... 3368.0
2 Consumer Products ... 1025.0
3 Collaboration ... 7840.0
4 Future of Supply Chains ... 2076.0
我想将每个主题列(2,3,4,...)与Pageviews
一起切片并附加它们,以便仅创建一个带有1个主题列和Pageviews
的数据框。
我习惯于在Stata中循环,您可以使用x
在列名中循环,但是我知道与Pyhton完全不同。
我从
开始for x in range(2, 9):
df_x = df[['Pageviews', df.x]]
但是Python无法识别df.x
如何遍历列名?可以使用迭代器来创建新的数据帧吗?
谢谢!
编辑
我想要的输出是
Col Pageviews
0 Financial Services 4106.0
1 Consumer Products 3368.0
2 Consumer Products 1025.0
3 Collaboration 7840.0
4 Future of Supply Chains 2076.0
5 Future of Reporting 2123.0
6 Sustainability Management 15576.0
7 Human Rights 52.0
8 BSR News 903.0
9 Energy and Extractives 1232.0
10 HERproject 616.0
11 Sustainability Management 10697.0
其中col是附加第2、3、4 ...列的结果,而Pageviews是附加相应的Pageviews列的结果。
答案 0 :(得分:1)
使用melt
df.melt('Pageviews').drop('variable',1)
Out[644]:
Pageviews value
0 1210 ConsumerProducts
1 1528 Collaboration
2 1716 FinancialServices
3 1403 Collaboration
4 1090 ConsumerProducts
5 1210 ConsumerProducts
6 1528 FutureofSupplyChains
7 1716 ConsumerProducts
8 1403 FinancialServices
9 1090 FutureofSupplyChains
10 1210 FinancialServices
11 1528 FinancialServices
12 1716 Collaboration
13 1403 FutureofSupplyChains
14 1090 FinancialServices
答案 1 :(得分:0)
我认为您正在寻找某种stack
方法而不是迭代方法(通常,在使用数据框时,迭代法是最后的选择,因为通常有矢量化方法可以完成大多数数据重组任务)。
以示例数据框为例:
>>> df
0 1 2 \
0 Consumer Products Consumer Products Financial Services
1 Collaboration Future of Supply Chains Financial Services
2 Financial Services Consumer Products Collaboration
3 Collaboration Financial Services Future of Supply Chains
4 Consumer Products Future of Supply Chains Financial Services
Pageviews
0 1210
1 1528
2 1716
3 1403
4 1090
您可以执行以下操作:
new_df = (df.set_index('Pageviews')
.stack()
.reset_index(0))
>>> new_df
Pageviews 0
0 1210 Consumer Products
1 1210 Consumer Products
2 1210 Financial Services
3 1528 Collaboration
4 1528 Future of Supply Chains
5 1528 Financial Services
6 1716 Financial Services
7 1716 Consumer Products
8 1716 Collaboration
9 1403 Collaboration
10 1403 Financial Services
11 1403 Future of Supply Chains
12 1090 Consumer Products
13 1090 Future of Supply Chains
14 1090 Financial Services