这是我的数据:
Identification Req
RCFD1797 Violet
BHCKK085 Green
RCFD1797 Green
BHCKK085 Orange
RCFD1797 Blue
BHCKK085 Yellow
BHCKK085 Red
WRSS1797 Green
WRSS1797 Violet
WRSS1797 Blue
RCON1797 Violet
RCON1797 Green
RCON1797 Blue
RCON1797 Indigo
BHDM1797 Violet
BHDM1797 Green
BHDM1797 Blue
BHDM1797 Indigo
第一列填充有重复的ID号。因此,例如,“ RCFD1797”显示三遍,每行有一个要求。这就是我需要的样子:
Identification Req_1 Req_2 Req_3 Req_4
RCFD1797 Violet Green Blue
BHCKK085 Green Orange Yellow Red
WRSS1797 Green Violet Blue
RCON1797 Violet Green Blue Indigo
BHDM1797 Violet Green Blue Indigo
我使用Pandas导入Excel文件没有问题,但是我不知道如何定义数据框以产生上面的第二张表。有什么想法吗?
谢谢!
答案 0 :(得分:3)
尝试一下:
test
#Pasting only the partial table here
Identification Req
0 RCFD1797 Violet
1 BHCKK085 Green
2 RCFD1797 Green
3 BHCKK085 Orange
4 RCFD1797 Blue
5 BHCKK085 Yellow
6 BHCKK085 Red
.. ... ...
my_df = test.groupby('Identification')['Req'].apply(list).apply(pd.Series)
my_df.columns = ['Req'+str(i) for i in my_df.columns]
my_df
Req0 Req1 Req2 Req3
Identification
BHCKK085 Green Orange Yellow Red
BHDM1797 Violet Green Blue Indigo
RCFD1797 Violet Green Blue NaN
RCON1797 Violet Green Blue Indigo
WRSS1797 Green Violet Blue NaN
希望这会有所帮助。
答案 1 :(得分:1)
使用set_index
df.set_index([
'Identification',
df.groupby('Identification').cumcount().add(1).astype(str).radd('Req_')
]).Req.unstack(fill_value='')
Req_1 Req_2 Req_3 Req_4
Identification
BHCKK085 Green Orange Yellow Red
BHDM1797 Violet Green Blue Indigo
RCFD1797 Violet Green Blue
RCON1797 Violet Green Blue Indigo
WRSS1797 Green Violet Blue