给出以下数据框:
d2=pd.DataFrame({'Item':['items','y','z','x'],
'other':['others','bb','cc','dd']})
d2
Item other
0 items others
1 y bb
2 z cc
3 x dd
我想创建一个多索引的标头集,以便当前标头变为0级,当前顶行变为1级。
提前致谢!
答案 0 :(得分:1)
使用参数append = True的第一列转置DataFrame,set_index,然后转置回来。
d2.T.set_index(0, append=1).T
答案 1 :(得分:1)
另一个解决方案是创建MultiIndex.from_tuples
:
cols = list(zip(d2.columns, d2.iloc[0,:]))
c1 = pd.MultiIndex.from_tuples(cols, names=[None, 0])
print (pd.DataFrame(data=d2[1:].values, columns=c1, index=d2.index[1:]))
Item other
0 items others
1 y bb
2 z cc
3 x dd
或者如果列名不重要:
cols = list(zip(d2.columns, d2.iloc[0,:]))
d2.columns = pd.MultiIndex.from_tuples(cols)
print (d2[1:])
Item other
items others
1 y bb
2 z cc
3 x dd
计时:
len(df)=400k
:
In [63]: %timeit jez(d22)
100 loops, best of 3: 6.22 ms per loop
In [64]: %timeit piR(d2)
10 loops, best of 3: 84.9 ms per loop
len(df)=40
:
In [70]: %timeit jez(d22)
The slowest run took 4.61 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 941 µs per loop
In [71]: %timeit piR(d2)
The slowest run took 4.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.36 ms per loop
<强>代码强>:
import pandas as pd
d2=pd.DataFrame({'Item':['items','y','z','x'],
'other':['others','bb','cc','dd']})
print (d2)
d2 = pd.concat([d2]*100000).reset_index(drop=True)
#d2 = pd.concat([d2]*10).reset_index(drop=True)
d22 = d2.copy()
def piR(d2):
return (d2.T.set_index(0, append=1).T)
def jez(d2):
cols = list(zip(d2.columns, d2.iloc[0,:]))
c1 = pd.MultiIndex.from_tuples(cols, names=[None, 0])
return pd.DataFrame(data=d2[1:].values, columns=c1, index=d2.index[1:])
print (piR(d2))
print (jez(d22))
print ((piR(d2) == jez(d22)).all())
Item items True
other others True
dtype: bool