如何仅分割部分数据帧并轮换为新的数据帧,而不论行大小如何?

时间:2019-08-03 20:43:30

标签: python-3.x pandas dataframe

我有一个如下所示的df:

 answerRequired    answerTime       choiceId  \
0             NaN           NaN            NaN   
1             NaN           NaN            NaN   
2             NaN           NaN            NaN   
3             NaN           NaN            NaN   
4             NaN           NaN            NaN   
5             NaN           NaN            NaN   
6             NaN           NaN            NaN   
7             NaN           NaN            NaN   
8             NaN           NaN            NaN   
9             NaN           NaN            NaN   
10            NaN           NaN            NaN   
11            NaN           NaN            NaN   
12            NaN           NaN            NaN   
13          False  1.564541e+12  1542213646976   
14          False  1.564541e+12  1542213646984   
15           True  1.564541e+12  1542213646994   
16           True  1.564541e+12  1542213647040   
17           True  1.564541e+12  1542213647041   
18           True  1.564541e+12  1542213647042   
19           True  1.564541e+12  1542213647043   
20          False  1.564541e+12            NaN   

                                          choiceLabel  \
0                                                 NaN   
1                                                 NaN   
2                                                 NaN   
3                                                 NaN   
4                                                 NaN   
5                                                 NaN   
6                                                 NaN   
7                                                 NaN   
8                                                 NaN   
9                                                 NaN   
10                                                NaN   
11                                                NaN   
12                                                NaN   
13  Give it a shot! Hit the arrow below! Don't be ...   
14                                                      
15                                           T-Shirts   
16                                       Band / Music   
17                                                Fun   
18                                                 TV   
19                                             Movies   
20                                                NaN   

                                          exportLabel        logicalType  \
0                                      Participant ID                NaN   
1                                              Viewed                NaN   
2                                             Started                NaN   
3                                           Completed                NaN   
4                           Time spent (HH:MM:SS.SSS)                NaN   
5                                             Country                NaN   
6                                                City                NaN   
7                                                  IP                NaN   
8                                    Operating System                NaN   
9                                             Browser                NaN   
10                                             Device                NaN   
11                                        External ID                NaN   
12                                           Warnings                NaN   
13  It's all about the green arrow! (not that Gree...    singleSelection   
14  Make your choice. 2. Hit the green arrow at th...    singleSelection   
15  What are you most interested in? (Pick one) (T...    singleSelection   
16  We have the threads that you want! What kind o...  multipleSelection   
17  We have the threads that you want! What kind o...  multipleSelection   
18  We have the threads that you want! What kind o...  multipleSelection   
19  We have the threads that you want! What kind o...  multipleSelection   
20                                                NaN               text   

                                             question     questionId  \
0                                                 NaN  participantId   
1                                                 NaN       viewTime   
2                                                 NaN    startedTime   
3                                                 NaN  completedTime   
4                                                 NaN      timeSpent   
5                                                 NaN   country_name   
6                                                 NaN           city   
7                                                 NaN             ip   
8                                                 NaN             os   
9                                                 NaN        browser   
10                                                NaN         device   
11                                                NaN     externalId   
12                                                NaN       warnings   
13  It's all about the green arrow! (not that Gree...  1542213646975   
14  Make your choice. 2. Hit the green arrow at th...  1542213646983   
15        What are you most interested in? (Pick one)  1542213646991   
16  We have the threads that you want! What kind o...  1542213647039   
17  We have the threads that you want! What kind o...  1542213647039   
18  We have the threads that you want! What kind o...  1542213647039   
19  We have the threads that you want! What kind o...  1542213647039   
20             Almost Done! Enter Your Email Address!  1542213647050   

    questionOrder            subType          type                     value  \
0             NaN                NaN            id      -Ll4truw3KbSjVRtXmJy   
1             NaN                NaN          time  2019-07-31T02:41:34.063Z   
2             NaN                NaN          time  2019-07-31T02:44:37.732Z   
3             NaN                NaN          time  2019-07-31T02:44:57.936Z   
4             NaN                NaN          time              00:00:00.000   
5             NaN                NaN      location                   Unknown   
6             NaN                NaN      location                Roslindale   
7             NaN                NaN      location                             
8             NaN                NaN        device               macOS 10.14   
9             NaN                NaN        device              Firefox 68.0   
10            NaN                NaN        device                             
11            NaN                NaN            id                             
12            NaN                NaN          info                        []   
13            0.0    singleSelection  mediaGallery                      True   
14            2.0    singleSelection  mediaGallery                      True   
15            4.0    singleSelection  mediaGallery                      True   
16           12.0  multipleSelection  mediaGallery                      True   
17           12.0  multipleSelection  mediaGallery                      True   
18           12.0  multipleSelection  mediaGallery                      True   
19           12.0  multipleSelection  mediaGallery                      True   
20           14.0                NaN      emailBox      123456789@yellow.com   

      visualType  
0            NaN  
1            NaN  
2            NaN  
3            NaN  
4            NaN  
5            NaN  
6            NaN  
7            NaN  
8            NaN  
9            NaN  
10           NaN  
11           NaN  
12           NaN  
13  mediaGallery  
14  mediaGallery  
15  mediaGallery  
16  mediaGallery  
17  mediaGallery  
18  mediaGallery  
19  mediaGallery  
20      emailBox  

如何切割数据框并使其如下所示:

enter image description here

我尝试过:

  1. df.T.stack()
  2. df_stack_test.T.groupby('level_1')[0] .apply(lambda x:pd.Series(list(x)))。unstack()。T

,但是这些命令在不汇总数据的情况下翻转了数据。

我想在高水平上

exportLabel列值翻转为列,将value列值翻转为exportLabel的列值下的值,只有question列为null的情况下。

然后,我想将question列的值翻转为不为null的列以及问题列下choicelabel的值。注意与同一问题相同的问题,将被折叠到一栏中。唯一的例外是question列中的最后一个值可以在value列中进行选择。

我们现在可以删除其余的列。另外,我可以发布要从API展平的原始json字符串。

编辑:

这是json字符串:

{"id":"4","survey_id":"-L","response_id":"-L","response_url":"data":[{"type":"id","questionId":"participantId","exportLabel":"Participant ID","value":"-Ll4truw3KbSjVRtXmJy"},{"type":"time","questionId":"viewTime","exportLabel":"Viewed","value":"2019-07-31T02:41:34.063Z"},{"type":"time","questionId":"startedTime","exportLabel":"Started","value":"2019-07-31T02:44:37.732Z"},{"type":"time","questionId":"completedTime","exportLabel":"Completed","value":"2019-07-31T02:44:57.936Z"},{"type":"time","questionId":"timeSpent","exportLabel":"Time spent (HH:MM:SS.SSS)","value":"00:00:00.000"},{"type":"location","questionId":"country_name","exportLabel":"Country","value":"Unknown"},{"type":"location","questionId":"city","exportLabel":"City","value":"Roslindale"},{"type":"location","questionId":"ip","exportLabel":"IP","value":""},{"type":"device","questionId":"os","exportLabel":"Operating System","value":"macOS 10.14"},{"type":"device","questionId":"browser","exportLabel":"Browser","value":"Firefox 68.0"},{"type":"device","questionId":"device","exportLabel":"Device","value":""},{"type":"id","questionId":"externalId","exportLabel":"External ID","value":""},{"type":"info","questionId":"warnings","exportLabel":"Warnings","value":[]},{"logicalType":"singleSelection","choiceId":"1542213646976","choiceLabel":"Give it a shot! Hit the arrow below! Don't be shy!","exportLabel":"It's all about the green arrow! (not that Green Arrow!) 1. Make your choice. 2. Hit the green arrow at the bottom! (Give it a shot! Hit the arrow below! Don't be shy!)","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":false,"questionId":1542213646975,"questionOrder":0,"question":"It's all about the green arrow! (not that Green Arrow!) 1. Make your choice. 2. Hit the green arrow at the bottom!","subType":"singleSelection","answerTime":1564541080009},{"logicalType":"singleSelection","choiceId":"1542213646984","choiceLabel":"","exportLabel":"Make your choice. 2. Hit the green arrow at the bottom! ()","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":false,"questionId":1542213646983,"questionOrder":2,"question":"Make your choice. 2. Hit the green arrow at the bottom!","subType":"singleSelection","answerTime":1564541081044},{"logicalType":"singleSelection","choiceId":"1542213646994","choiceLabel":"T-Shirts","exportLabel":"What are you most interested in? (Pick one) (T-Shirts)","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":true,"questionId":1542213646991,"questionOrder":4,"question":"What are you most interested in? (Pick one)","subType":"singleSelection","answerTime":1564541083354},{"logicalType":"multipleSelection","choiceId":"1542213647040","choiceLabel":"Band / Music","exportLabel":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!) (Band / Music)","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":true,"questionId":1542213647039,"questionOrder":12,"question":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!)","subType":"multipleSelection","answerTime":1564541086280},{"logicalType":"multipleSelection","choiceId":"1542213647041","choiceLabel":"Fun","exportLabel":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!) (Fun)","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":true,"questionId":1542213647039,"questionOrder":12,"question":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!)","subType":"multipleSelection","answerTime":1564541086280},{"logicalType":"multipleSelection","choiceId":"1542213647042","choiceLabel":"TV","exportLabel":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!) (TV)","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":true,"questionId":1542213647039,"questionOrder":12,"question":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!)","subType":"multipleSelection","answerTime":1564541086280},{"logicalType":"multipleSelection","choiceId":"1542213647043","choiceLabel":"Movies","exportLabel":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!) (Movies)","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":true,"questionId":1542213647039,"questionOrder":12,"question":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!)","subType":"multipleSelection","answerTime":1564541086280},{"type":"emailBox","visualType":"emailBox","answerRequired":false,"questionId":1542213647050,"questionOrder":14,"question":"Almost Done! Enter Your Email Address!","answerTime":1564541097466,"logicalType":"text","value":"123456789@yellow.com"}]}

我将字符串转换为第一个df,如下所示:

from pandas.io.json import json_normalize
import pandas as pd
import json

with open('jsonfile') as json_file:
    data = json.load(json_normalize(json_file))


df = json_normalize(data['data'])

1 个答案:

答案 0 :(得分:1)

如果需要像原始添加boolean indexing中一样的顺序,则按GroupBy.cumcount的条件对行进行过滤,然后按DataFrame.unstack的形状对计数器和DataFrame.reindex进行整形。

第一部分是:

df1 = data.loc[data['question'].isna(), ['exportLabel','value']]
print (df1)
                  exportLabel                     value
0              Participant ID      -Ll4truw3KbSjVRtXmJy
1                      Viewed  2019-07-31T02:41:34.063Z
2                     Started  2019-07-31T02:44:37.732Z
3                   Completed  2019-07-31T02:44:57.936Z
4   Time spent (HH:MM:SS.SSS)              00:00:00.000
5                     Country                   Unknown
6                        City                Roslindale
7                          IP                          
8            Operating System               macOS 10.14
9                     Browser              Firefox 68.0
10                     Device                          
11                External ID                          
12                   Warnings                        []

df11 = (df1.set_index([df1.groupby('exportLabel').cumcount(),
                      'exportLabel'])['value']
           .unstack()
           .rename_axis(None, axis=1)
           .reindex(df1['exportLabel'].unique(), axis=1)
           )
print (df11)
         Participant ID                    Viewed                   Started  \
0  -Ll4truw3KbSjVRtXmJy  2019-07-31T02:41:34.063Z  2019-07-31T02:44:37.732Z   

                  Completed Time spent (HH:MM:SS.SSS)  Country        City IP  \
0  2019-07-31T02:44:57.936Z              00:00:00.000  Unknown  Roslindale      

  Operating System       Browser Device External ID Warnings  
0      macOS 10.14  Firefox 68.0                          [] 

第二:

df2 = data.loc[data['question'].notna(), ['question','value','choiceLabel']]

#if need replace all missing values by value column
#df2['choiceLabel'] = df2['choiceLabel'].fillna(df2['value'])

#if need replace only last value if missing
idx = df2.index[[-1]]
df2.loc[idx,'choiceLabel'] = df2.loc[idx,'choiceLabel'].fillna(df2.loc[idx,'value'])
print (df2)
                                             question                 value  \
13  It's all about the green arrow! (not that Gree...                  True   
14  Make your choice. 2. Hit the green arrow at th...                  True   
15        What are you most interested in? (Pick one)                  True   
16  We have the threads that you want! What kind o...                  True   
17  We have the threads that you want! What kind o...                  True   
18  We have the threads that you want! What kind o...                  True   
19  We have the threads that you want! What kind o...                  True   
20             Almost Done! Enter Your Email Address!  123456789@yellow.com   

                                          choiceLabel  
13  Give it a shot! Hit the arrow below! Don't be ...  
14                                                     
15                                           T-Shirts  
16                                       Band / Music  
17                                                Fun  
18                                                 TV  
19                                             Movies  
20                               123456789@yellow.com  

df21 = (df2.set_index([df2.groupby('question').cumcount(),
                      'question'])['choiceLabel']
           .unstack()
           .rename_axis(None, axis=1)
           .reindex(df2['question'].unique(), axis=1)
           )

print (df21)
  It's all about the green arrow! (not that Green Arrow!) 1. Make your choice. 2. Hit the green arrow at the bottom!  \
0  Give it a shot! Hit the arrow below! Don't be ...                                                                   
1                                                NaN                                                                   
2                                                NaN                                                                   
3                                                NaN                                                                   

  Make your choice. 2. Hit the green arrow at the bottom!  \
0                                                           
1                                                NaN        
2                                                NaN        
3                                                NaN        

  What are you most interested in? (Pick one)  \
0                                    T-Shirts   
1                                         NaN   
2                                         NaN   
3                                         NaN   

  We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!)  \
0                                       Band / Music                                                               
1                                                Fun                                                               
2                                                 TV                                                               
3                                             Movies                                                               

  Almost Done! Enter Your Email Address!  
0                   123456789@yellow.com  
1                                    NaN  
2                                    NaN  
3                                    NaN