透视Pandas Dataframe,没有数字类型,索引不是唯一的

时间:2018-02-05 19:30:07

标签: python pandas pivot-table

我正在尝试将一些字符串数据转换为列,但是使用past responses时遇到了困难,因为我没有可以使用的唯一索引或多索引。

样本格式



  index	location	field	        value
1	location1	firstName	A
2	location1	lastName	B
3	location1	dob	        C 
4	location1	email	        D
5	location1	title	        E
6	location1	address1	F
7	location1	address2	G
8	location1	address3	H
9	location1	firstName	I
10	location1	lastName	J
11	location1	dob	        K
12	location1	email	        L
13	location1	title	        M
14	location1	address1	N
15	location1	address2	O
16	location1	address3	P
40	location2	firstName	Q
41	location2	lastName	R
42	location2	dob	        S
43	location2	email	        T
44	location2	title	        U
45	location2	address1	V
46	location2	address2	W
47	location2	address3	X




格式我想转到:



location	firstName lastName dob email title address1	address2 address3
location1	A	B	C	D	E	F	G	H
location1	I	J	K	L	M	N	O	P
location2	Q	R	S	T	U	V	W	X




我最接近实现这一目标的方法是使用aggfuc =' first',但这需要每个位置的所有值,而不仅仅是第一个。

格式我想转到:



df = df.pivot_table(index='location',columns='field',values='value',aggfunc='first')




1 个答案:

答案 0 :(得分:0)

您需要使用代理列进行转动。以下是使用cumsum + set_index + unstack的解决方案。

v = df.set_index(['location', 'field', df.field.eq('firstName').cumsum()]).unstack(-2) 
v.index = v.index.droplevel(-1)
v.columns = v.columns.droplevel(0)

field     address1 address2 address3         dob      email firstName  \
location                                                                
location1        F        G        H          C           D         A   
location1        N        O        P           K          L         I   
location2        V        W        X           S          T         Q   

field     lastName      title  
location                       
location1        B          E  
location1        J          M  
location2        R          U