我有一个如下所示的df:
answerRequired answerTime choiceId \
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
5 NaN NaN NaN
6 NaN NaN NaN
7 NaN NaN NaN
8 NaN NaN NaN
9 NaN NaN NaN
10 NaN NaN NaN
11 NaN NaN NaN
12 NaN NaN NaN
13 False 1.564541e+12 1542213646976
14 False 1.564541e+12 1542213646984
15 True 1.564541e+12 1542213646994
16 True 1.564541e+12 1542213647040
17 True 1.564541e+12 1542213647041
18 True 1.564541e+12 1542213647042
19 True 1.564541e+12 1542213647043
20 False 1.564541e+12 NaN
choiceLabel \
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 Give it a shot! Hit the arrow below! Don't be ...
14
15 T-Shirts
16 Band / Music
17 Fun
18 TV
19 Movies
20 NaN
exportLabel logicalType \
0 Participant ID NaN
1 Viewed NaN
2 Started NaN
3 Completed NaN
4 Time spent (HH:MM:SS.SSS) NaN
5 Country NaN
6 City NaN
7 IP NaN
8 Operating System NaN
9 Browser NaN
10 Device NaN
11 External ID NaN
12 Warnings NaN
13 It's all about the green arrow! (not that Gree... singleSelection
14 Make your choice. 2. Hit the green arrow at th... singleSelection
15 What are you most interested in? (Pick one) (T... singleSelection
16 We have the threads that you want! What kind o... multipleSelection
17 We have the threads that you want! What kind o... multipleSelection
18 We have the threads that you want! What kind o... multipleSelection
19 We have the threads that you want! What kind o... multipleSelection
20 NaN text
question questionId \
0 NaN participantId
1 NaN viewTime
2 NaN startedTime
3 NaN completedTime
4 NaN timeSpent
5 NaN country_name
6 NaN city
7 NaN ip
8 NaN os
9 NaN browser
10 NaN device
11 NaN externalId
12 NaN warnings
13 It's all about the green arrow! (not that Gree... 1542213646975
14 Make your choice. 2. Hit the green arrow at th... 1542213646983
15 What are you most interested in? (Pick one) 1542213646991
16 We have the threads that you want! What kind o... 1542213647039
17 We have the threads that you want! What kind o... 1542213647039
18 We have the threads that you want! What kind o... 1542213647039
19 We have the threads that you want! What kind o... 1542213647039
20 Almost Done! Enter Your Email Address! 1542213647050
questionOrder subType type value \
0 NaN NaN id -Ll4truw3KbSjVRtXmJy
1 NaN NaN time 2019-07-31T02:41:34.063Z
2 NaN NaN time 2019-07-31T02:44:37.732Z
3 NaN NaN time 2019-07-31T02:44:57.936Z
4 NaN NaN time 00:00:00.000
5 NaN NaN location Unknown
6 NaN NaN location Roslindale
7 NaN NaN location
8 NaN NaN device macOS 10.14
9 NaN NaN device Firefox 68.0
10 NaN NaN device
11 NaN NaN id
12 NaN NaN info []
13 0.0 singleSelection mediaGallery True
14 2.0 singleSelection mediaGallery True
15 4.0 singleSelection mediaGallery True
16 12.0 multipleSelection mediaGallery True
17 12.0 multipleSelection mediaGallery True
18 12.0 multipleSelection mediaGallery True
19 12.0 multipleSelection mediaGallery True
20 14.0 NaN emailBox 123456789@yellow.com
visualType
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 mediaGallery
14 mediaGallery
15 mediaGallery
16 mediaGallery
17 mediaGallery
18 mediaGallery
19 mediaGallery
20 emailBox
如何切割数据框并使其如下所示:
我尝试过:
,但是这些命令在不汇总数据的情况下翻转了数据。
我想在高水平上
将exportLabel
列值翻转为列,将value
列值翻转为exportLabel
的列值下的值,只有question
列为null的情况下。
然后,我想将question
列的值翻转为不为null的列以及问题列下choicelabel
的值。注意与同一问题相同的问题,将被折叠到一栏中。唯一的例外是question
列中的最后一个值可以在value
列中进行选择。
我们现在可以删除其余的列。另外,我可以发布要从API展平的原始json字符串。
编辑:
这是json字符串:
{"id":"4","survey_id":"-L","response_id":"-L","response_url":"data":[{"type":"id","questionId":"participantId","exportLabel":"Participant ID","value":"-Ll4truw3KbSjVRtXmJy"},{"type":"time","questionId":"viewTime","exportLabel":"Viewed","value":"2019-07-31T02:41:34.063Z"},{"type":"time","questionId":"startedTime","exportLabel":"Started","value":"2019-07-31T02:44:37.732Z"},{"type":"time","questionId":"completedTime","exportLabel":"Completed","value":"2019-07-31T02:44:57.936Z"},{"type":"time","questionId":"timeSpent","exportLabel":"Time spent (HH:MM:SS.SSS)","value":"00:00:00.000"},{"type":"location","questionId":"country_name","exportLabel":"Country","value":"Unknown"},{"type":"location","questionId":"city","exportLabel":"City","value":"Roslindale"},{"type":"location","questionId":"ip","exportLabel":"IP","value":""},{"type":"device","questionId":"os","exportLabel":"Operating System","value":"macOS 10.14"},{"type":"device","questionId":"browser","exportLabel":"Browser","value":"Firefox 68.0"},{"type":"device","questionId":"device","exportLabel":"Device","value":""},{"type":"id","questionId":"externalId","exportLabel":"External ID","value":""},{"type":"info","questionId":"warnings","exportLabel":"Warnings","value":[]},{"logicalType":"singleSelection","choiceId":"1542213646976","choiceLabel":"Give it a shot! Hit the arrow below! Don't be shy!","exportLabel":"It's all about the green arrow! (not that Green Arrow!) 1. Make your choice. 2. Hit the green arrow at the bottom! (Give it a shot! Hit the arrow below! Don't be shy!)","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":false,"questionId":1542213646975,"questionOrder":0,"question":"It's all about the green arrow! (not that Green Arrow!) 1. Make your choice. 2. Hit the green arrow at the bottom!","subType":"singleSelection","answerTime":1564541080009},{"logicalType":"singleSelection","choiceId":"1542213646984","choiceLabel":"","exportLabel":"Make your choice. 2. Hit the green arrow at the bottom! ()","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":false,"questionId":1542213646983,"questionOrder":2,"question":"Make your choice. 2. Hit the green arrow at the bottom!","subType":"singleSelection","answerTime":1564541081044},{"logicalType":"singleSelection","choiceId":"1542213646994","choiceLabel":"T-Shirts","exportLabel":"What are you most interested in? (Pick one) (T-Shirts)","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":true,"questionId":1542213646991,"questionOrder":4,"question":"What are you most interested in? (Pick one)","subType":"singleSelection","answerTime":1564541083354},{"logicalType":"multipleSelection","choiceId":"1542213647040","choiceLabel":"Band / Music","exportLabel":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!) (Band / Music)","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":true,"questionId":1542213647039,"questionOrder":12,"question":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!)","subType":"multipleSelection","answerTime":1564541086280},{"logicalType":"multipleSelection","choiceId":"1542213647041","choiceLabel":"Fun","exportLabel":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!) (Fun)","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":true,"questionId":1542213647039,"questionOrder":12,"question":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!)","subType":"multipleSelection","answerTime":1564541086280},{"logicalType":"multipleSelection","choiceId":"1542213647042","choiceLabel":"TV","exportLabel":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!) (TV)","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":true,"questionId":1542213647039,"questionOrder":12,"question":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!)","subType":"multipleSelection","answerTime":1564541086280},{"logicalType":"multipleSelection","choiceId":"1542213647043","choiceLabel":"Movies","exportLabel":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!) (Movies)","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":true,"questionId":1542213647039,"questionOrder":12,"question":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!)","subType":"multipleSelection","answerTime":1564541086280},{"type":"emailBox","visualType":"emailBox","answerRequired":false,"questionId":1542213647050,"questionOrder":14,"question":"Almost Done! Enter Your Email Address!","answerTime":1564541097466,"logicalType":"text","value":"123456789@yellow.com"}]}
我将字符串转换为第一个df,如下所示:
from pandas.io.json import json_normalize
import pandas as pd
import json
with open('jsonfile') as json_file:
data = json.load(json_normalize(json_file))
df = json_normalize(data['data'])
答案 0 :(得分:1)
如果需要像原始添加boolean indexing
中一样的顺序,则按GroupBy.cumcount
的条件对行进行过滤,然后按DataFrame.unstack
的形状对计数器和DataFrame.reindex
进行整形。
第一部分是:
df1 = data.loc[data['question'].isna(), ['exportLabel','value']]
print (df1)
exportLabel value
0 Participant ID -Ll4truw3KbSjVRtXmJy
1 Viewed 2019-07-31T02:41:34.063Z
2 Started 2019-07-31T02:44:37.732Z
3 Completed 2019-07-31T02:44:57.936Z
4 Time spent (HH:MM:SS.SSS) 00:00:00.000
5 Country Unknown
6 City Roslindale
7 IP
8 Operating System macOS 10.14
9 Browser Firefox 68.0
10 Device
11 External ID
12 Warnings []
df11 = (df1.set_index([df1.groupby('exportLabel').cumcount(),
'exportLabel'])['value']
.unstack()
.rename_axis(None, axis=1)
.reindex(df1['exportLabel'].unique(), axis=1)
)
print (df11)
Participant ID Viewed Started \
0 -Ll4truw3KbSjVRtXmJy 2019-07-31T02:41:34.063Z 2019-07-31T02:44:37.732Z
Completed Time spent (HH:MM:SS.SSS) Country City IP \
0 2019-07-31T02:44:57.936Z 00:00:00.000 Unknown Roslindale
Operating System Browser Device External ID Warnings
0 macOS 10.14 Firefox 68.0 []
第二:
df2 = data.loc[data['question'].notna(), ['question','value','choiceLabel']]
#if need replace all missing values by value column
#df2['choiceLabel'] = df2['choiceLabel'].fillna(df2['value'])
#if need replace only last value if missing
idx = df2.index[[-1]]
df2.loc[idx,'choiceLabel'] = df2.loc[idx,'choiceLabel'].fillna(df2.loc[idx,'value'])
print (df2)
question value \
13 It's all about the green arrow! (not that Gree... True
14 Make your choice. 2. Hit the green arrow at th... True
15 What are you most interested in? (Pick one) True
16 We have the threads that you want! What kind o... True
17 We have the threads that you want! What kind o... True
18 We have the threads that you want! What kind o... True
19 We have the threads that you want! What kind o... True
20 Almost Done! Enter Your Email Address! 123456789@yellow.com
choiceLabel
13 Give it a shot! Hit the arrow below! Don't be ...
14
15 T-Shirts
16 Band / Music
17 Fun
18 TV
19 Movies
20 123456789@yellow.com
df21 = (df2.set_index([df2.groupby('question').cumcount(),
'question'])['choiceLabel']
.unstack()
.rename_axis(None, axis=1)
.reindex(df2['question'].unique(), axis=1)
)
print (df21)
It's all about the green arrow! (not that Green Arrow!) 1. Make your choice. 2. Hit the green arrow at the bottom! \
0 Give it a shot! Hit the arrow below! Don't be ...
1 NaN
2 NaN
3 NaN
Make your choice. 2. Hit the green arrow at the bottom! \
0
1 NaN
2 NaN
3 NaN
What are you most interested in? (Pick one) \
0 T-Shirts
1 NaN
2 NaN
3 NaN
We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!) \
0 Band / Music
1 Fun
2 TV
3 Movies
Almost Done! Enter Your Email Address!
0 123456789@yellow.com
1 NaN
2 NaN
3 NaN