我基本上想要采用这个数据帧:
collector_id date_created row_id question_id respondent_id survey_id
0 24785342 2015-02-25 00:40:00 3055824979 319047238 5004656403 101692922
1 24785342 2015-02-25 00:40:00 3055824979 319047238 5004656404 101692922
2 24785342 2015-02-25 00:40:00 3055824980 319047238 5004656405 101692922
3 24785342 2015-02-25 00:40:00 3055824980 319047238 5004656406 101692922
4 24785342 2015-02-25 00:40:00 3055824980 319047238 5004656407 101692922
5 24785342 2015-02-25 00:40:00 3055824980 319047238 5004656408 101692922
6 24785342 2015-02-25 00:40:00 3055824981 319047238 5004656409 101692922
并将其转换为:
collector_id date_created 319047238 respondent_id survey_id
0 24785342 2015-02-25 00:40:00 3055824979 5004656403 101692922
1 24785342 2015-02-25 00:40:00 3055824979 5004656404 101692922
2 24785342 2015-02-25 00:40:00 3055824980 5004656405 101692922
3 24785342 2015-02-25 00:40:00 3055824980 5004656406 101692922
4 24785342 2015-02-25 00:40:00 3055824980 5004656407 101692922
5 24785342 2015-02-25 00:40:00 3055824980 5004656408 101692922
6 24785342 2015-02-25 00:40:00 3055824981 5004656409 101692922
将每个问题ID转换为一个列,然后将row_id放在其下面。
答案 0 :(得分:0)
这似乎有效:
df = df.pivot_table(
'question_id', ['respondent_id', 'survey_id'], 'row_id'
).reset_index()
它返回:
row_id respondent_id survey_id 3055827274 3055827275 3055827276
0 5004658716 101693626 319047673 NaN NaN
1 5004658717 101693626 319047673 NaN NaN
2 5004658718 101693626 NaN 319047673 NaN
3 5004658719 101693626 NaN 319047673 NaN
4 5004658720 101693626 NaN 319047673 NaN
5 5004658721 101693626 NaN 319047673 NaN
6 5004658722 101693626 NaN NaN 319047673