我有以下格式的csv,
print rfd.iloc[:5,:5]
Sub-division January 2010 Actual January 2010 Normal January 2011 Actual February 2010 Actual
0 Andaman and Nicobar Islands 98.2 53.7 222.5 5.8
1 Arunachal Pradesh 0.4 50.1 37.6 10.0
2 Assam and Meghalaya 0.2 16.4 9.0 3.4
3 Nagaland,Manipur, Mizoram, and Tripura 0.9 13.7 7.9 10.9
4 Sub-Himalayan,West Bengal & Sikkim 1.7 26.6 7.1 6.4
如何将其转换为多级列。第一级是Year,然后是Month和type。
rfd.columns
Out[89]:
Index([u'Sub-division ', u'January 2010 Actual ', u'January 2010 Normal ',
u'January 2011 Actual ', u'February 2010 Actual ',
....
u'December 2010 Normal ', u' December 2011 Actual '],
dtype='object')
我尝试了类似rfd.columns = rfd.columns.str.split(" ")
的内容,然后数据框变为TypeError: unhashable type: 'list'
。如果它只是一个文件,我可以在csv中加载它并加载,但它是可重复的过程,所以寻找一些我可以迭代文件的解决方案。
添加两行字典
{'April 2010 Normal': {0: 81.5, 1: 278.80000000000001},
'April 2010 Actual': {0: 12.699999999999999, 1: 245.80000000000001},
'April 2011 Actual': {0: 83.700000000000003, 1: 114.7},
'August 2010 Actual': {0: 550.0, 1: 343.30000000000001},
'August 2010 Normal': {0: 403.80000000000001, 1: 359.89999999999998},
'August 2011 Actual': {0: 513.0, 1: 225.80000000000001},
'December 2010 Normal': {0: 145.5, 1: 38.399999999999999},
'December 2010 Actual': {0: 254.40000000000001, 1: 6.0},
'December 2011 Actual': {0: 246.30000000000001, 1: 10.300000000000001},
'February 2010 Actual': {0: 5.7999999999999998, 1: 10.0},
'February 2010 Normal': {0: 29.199999999999999, 1: 98.0},
'February 2011 Actual': {0: 81.900000000000006, 1: 36.799999999999997},
'January 2010 Normal': {0: 53.700000000000003, 1: 50.100000000000001},
'January 2010 Actual': {0: 98.200000000000003, 1: 0.40000000000000002},
'January 2011 Actual': {0: 222.5, 1: 37.600000000000001},
'July 2010 Normal': {0: 407.69999999999999, 1: 536.10000000000002},
'July 2010 Actual': {0: 522.10000000000002, 1: 426.0},
'July 2011 Actual': {0: 575.79999999999995, 1: 553.5},
'June 2010 Normal': {0: 438.60000000000002, 1: 500.39999999999998},
'June 2011 Actual': {0: 418.39999999999998, 1: 336.80000000000001},
'June 2010 Actual': {0: 435.0, 1: 397.30000000000001},
'March 2010 Normal': {0: 25.0, 1: 179.69999999999999},
'March 2010 Normal': {0: 20.5, 1: 164.40000000000001},
'March 2011 Actual': {0: 305.5, 1: 121.5},
'March 2010 Actual': {0: 0.40000000000000002, 1: 143.59999999999999},
'May 2010 Actual': {0: 310.69999999999999, 1: 273.80000000000001},
'May 2010 Normal': {0: 358.5, 1: 291.89999999999998},
'May 2011 Actual': {0: 305.69999999999999, 1: 157.80000000000001},
'November 2010 Normal': {0: 253.69999999999999, 1: 45.799999999999997},
'November 2010 Actual': {0: 281.39999999999998, 1: 59.700000000000003},
'November 2011 Actual': {0: 126.0, 1: 19.800000000000001},
'October 2010 Actual': {0: 415.19999999999999, 1: 84.400000000000006},
'October 2010 Normal': {0: 296.69999999999999, 1: 183.0},
'October 2011 Actual': {0: 183.80000000000001, 1: 46.799999999999997},
'September 2010 Normal': {0: 432.39999999999998, 1: 371.60000000000002},
'September 2010 Actual': {0: 261.30000000000001, 1: 407.39999999999998},
'September 2011 Actual': {0: 770.89999999999998, 1: 262.0},
'Sub-division': {0: 'Andaman and Nicobar Islands ', 1: 'Arunachal Pradesh'},
'october 2010 Normal': {0: 297.80000000000001, 1: 159.09999999999999}}
答案 0 :(得分:1)
我很确定这不是最好的方式'做到这一点,可能不是很理想
import pandas as pd
a = pd.read_csv('data.csv', sep=';')
b = a.set_index('Sub-division').unstack().reset_index()
c = b['level_0']
d = c.str.extract('(?P<Month>[A-Za-z]*) +(?P<Year>[0-9][\w\d]*) +(?P<Level>[A-Za-z]*)')
e = pd.concat([b[['Sub-division',0]], d], axis=1)
f = e.set_index(['Sub-division', 'Year', 'Month', 'Level'])
f = f.unstack(['Year','Month','Level'])
f.columns = f.columns.droplevel(0)
f.sortlevel(level=0,axis=1)
但它可以做你想要的,你正在寻找的功能可能是 pd.str.extract
输出:
Year 2010 2011
Month February January January
Level Actual Actual Normal Actual
Sub-division
Andaman and Nicobar Islands 5.8 98.2 53.7 222.5
Arunachal Pradesh 10.0 0.4 50.1 37.6
Assam and Meghalaya 3.4 0.2 16.4 9.0
Nagaland,Manipur, Mizoram and Tripura 10.9 0.9 13.7 7.9
Sub-Himalayan,West Bengal & Sikkim 6.4 1.7 26.6 7.1
你在熊猫中有特殊工具来处理时间序列,所以你可以更好地表达你在这里看到的内容。