我正在尝试将一些字符串数据转换为列,但是使用past responses时遇到了困难,因为我没有可以使用的唯一索引或多索引。
样本格式
index location field value
1 location1 firstName A
2 location1 lastName B
3 location1 dob C
4 location1 email D
5 location1 title E
6 location1 address1 F
7 location1 address2 G
8 location1 address3 H
9 location1 firstName I
10 location1 lastName J
11 location1 dob K
12 location1 email L
13 location1 title M
14 location1 address1 N
15 location1 address2 O
16 location1 address3 P
40 location2 firstName Q
41 location2 lastName R
42 location2 dob S
43 location2 email T
44 location2 title U
45 location2 address1 V
46 location2 address2 W
47 location2 address3 X

格式我想转到:
location firstName lastName dob email title address1 address2 address3
location1 A B C D E F G H
location1 I J K L M N O P
location2 Q R S T U V W X

我最接近实现这一目标的方法是使用aggfuc =' first',但这需要每个位置的所有值,而不仅仅是第一个。
格式我想转到:
df = df.pivot_table(index='location',columns='field',values='value',aggfunc='first')

答案 0 :(得分:0)
您需要使用代理列进行转动。以下是使用cumsum
+ set_index
+ unstack
的解决方案。
v = df.set_index(['location', 'field', df.field.eq('firstName').cumsum()]).unstack(-2)
v.index = v.index.droplevel(-1)
v.columns = v.columns.droplevel(0)
field address1 address2 address3 dob email firstName \
location
location1 F G H C D A
location1 N O P K L I
location2 V W X S T Q
field lastName title
location
location1 B E
location1 J M
location2 R U