我有一个pandas数据框,其中包含我想要添加到每一行的列标题中的一些信息。数据框如下所示:
print working_df
Retail Sales of Electricity : Arkansas : Industrial : Annual \
Year
0 16709.19272
1 16847.75502
2 16993.92202
3 16774.69902
4 14710.29400
Retail Sales of Electricity : Arizona : Residential : Annual \
Year
0 33138.47860
1 32922.97001
2 33079.07402
3 32448.13802
4 32846.84298
[8 rows x 701 columns]
如何从列名称(状态,例如Arizona,以及扇区,例如Industrial或Residential)中提取两个变量,并将它们分别作为一个值放在两个新列中?
我希望字段看起来像
Year State Sector Sales
0 Arizona Residential 33138.47860
1 Arizona Residential 32922.97001
2 Arizona Residential 33079.07402
3 Arizona Residential 32448.13802
4 Arizona Residential 32846.84298
0 Arkansas Industrial 16709.19272
1 Arkansas Industrial 16847.75502
2 Arkansas Industrial 16993.92202
3 Arkansas Industrial 16774.69902
4 Arkansas Industrial 14710.29400
答案 0 :(得分:3)
我想我会做像
这样的事情d2 = df.unstack().reset_index()
d2 = d2.rename(columns={0: "Sales"})
parts = d2.pop("level_0").str.split(":")
d2["State"] = [p[1].strip() for p in parts]
d2["Sector"] = [p[2].strip() for p in parts]
产生
>>> d2
Year Sales State Sector
0 0 16709.19272 Arkansas Industrial
1 1 16847.75502 Arkansas Industrial
2 2 16993.92202 Arkansas Industrial
3 3 16774.69902 Arkansas Industrial
4 4 14710.29400 Arkansas Industrial
5 0 33138.47860 Arizona Residential
6 1 32922.97001 Arizona Residential
7 2 33079.07402 Arizona Residential
8 3 32448.13802 Arizona Residential
9 4 32846.84298 Arizona Residential
[10 rows x 4 columns]
你可能会变得更有魅力,也许可以使用str.extract
- str.extract(r".*?:\s*(?P<State>.*?)\s*:\s*(?P<Sector>.*?)\s*:.*")
做点什么,但我觉得这不值得。