我尝试以宽格式创建数据框,然后将其转换为长格式,如此处所述https://medium.com/@wangyuw/data-reshaping-with-pandas-explained-80b2f51f88d2
df = pd.DataFrame({'Mode': ['car', 'car', 'car', 'air', 'air', 'car', 'car', 'air', 'air'],'id':[1,2,3,4,5,6,7,8,9],'time.air': [2.8, 2.9, 2.2, 2, 1.8, 1.9, 2.2, 2.3, 2.1],'time.car': [3.4, 3.8, 2.9, 3.2, 2.8, 2.4, 3.3, 3.4, 2.9]})
然后当我使用以下代码使用wide_to_long函数时,我没有得到所需的输出:
l = pd.wide_to_long(df, stubnames='time', i=['id'], j='alternate',sep=".")
有人可以帮助我,我做错了吗?
答案 0 :(得分:1)
通过使用后缀 = '\w+' 来改变格式
l = pd.wide_to_long(df, stubnames='time', i=['id'], j='alternate',sep=".", suffix='\w+')
答案 1 :(得分:0)
.
之后的列名中的每个类别都有问题需要整数:
print(df)
Mode id time.air time.car
0 car 1 2.8 3.4
1 car 2 2.9 3.8
2 car 3 2.2 2.9
3 air 4 2.0 3.2
4 air 5 1.8 2.8
5 car 6 1.9 2.4
6 car 7 2.2 3.3
7 air 8 2.3 3.4
8 air 9 2.1 2.9
解决方案将类别替换为整数,应用解决方案并返回map
:
c = df.columns[df.columns.str.startswith('time')]
cats = c.str.split('.', expand=True).levels[1]
mapping1 = {str(k):v for k, v in enumerate(cats)}
mapping2 = {v:k for k, v in mapping1.items()}
df.columns = df.columns.to_series().replace(mapping2, regex=True)
print (df)
Mode id time.0 time.1
0 car 1 2.8 3.4
1 car 2 2.9 3.8
2 car 3 2.2 2.9
3 air 4 2.0 3.2
4 air 5 1.8 2.8
5 car 6 1.9 2.4
6 car 7 2.2 3.3
7 air 8 2.3 3.4
8 air 9 2.1 2.9
l = pd.wide_to_long(df, stubnames='time', i='id', j='alternate',sep=".").reset_index()
l['alternate'] = l['alternate'].map(mapping1)
print (l)
id alternate Mode time
0 1 air car 2.8
1 2 air car 2.9
2 3 air car 2.2
3 4 air air 2.0
4 5 air air 1.8
5 6 air car 1.9
6 7 air car 2.2
7 8 air air 2.3
8 9 air air 2.1
9 1 car car 3.4
10 2 car car 3.8
11 3 car car 2.9
12 4 car air 3.2
13 5 car air 2.8
14 6 car car 2.4
15 7 car car 3.3
16 8 car air 3.4
17 9 car air 2.9