将JSON行数据分为熊猫数据框的多列

时间:2020-01-31 19:02:15

标签: json pandas recommender-systems

从json到pandas读取数据时,将读取多标准酒店评分列,如下所示。我的数据框“评分”和“ ReviewID”中有2列。由于我是从较大的Json读取数据框的,因此“评分”列为每个审阅者都有一个条目,格式为:

`result.head()
                            Ratings                      ReviewID
0   {'Service': '5', 'Cleanliness': '5', 'Overall'...     12
1   {'Service': '4', 'Cleanliness': '4', 'Overall'...     54
2   {'Service': '5', 'Cleanliness': '5', 'Overall'...     48
3   {'Service': '5', 'Cleanliness': '5', 'Overall'...     90
4   {'Service': '5', 'Cleanliness': '5', 'Overall'...     75`

我的目的是将评级列分为7个不同的列,每个列都有各自的标准值:`

ReviewID Service Cleanliness Value Rooms Location Check-in Desk  Overall
27        1          1        5      4     5        5       5      4
9         1          5        5      5     5        4       3      5
22        6          3        2      4     3        3       3      3`

任何建议的格式都会对您有很大帮助。

available dataframe Required dataframe

2 个答案:

答案 0 :(得分:1)

以下代码对我有用 `

Rating = result['Ratings'].values.tolist()
 rate = pd.DataFrame(Rating,columns =['Service', 'Cleanliness','Overall'])


   Service   Cleanliness     Overall
         0        5               5
         1        4               4`

答案 1 :(得分:0)

如果您的数据框如下所示:

from ast import literal_eval
pd.set_option('display.max_colwidth', -1)
print(df)



                                                 Ratings ReviewID
0  {'Service': '5', 'Cleanliness': '5', 'Overall': '10'}     12  
1  {'Service': '4', 'Cleanliness': '4', 'Overall': '10'}     54  
2  {'Service': '5', 'Cleanliness': '5', 'Overall': '10'}     48  
3  {'Service': '5', 'Cleanliness': '5', 'Overall': '10'}     90  
4  {'Service': '5', 'Cleanliness': '5', 'Overall': '10'}     75` 

然后,我们只需要将每一行逐字解释为python字典并用pd.Series拆包

json_series = df['Ratings'].map(literal_eval).apply(pd.Series)

给您

  Service Cleanliness Overall
0  5       5           10    
1  4       4           10    
2  5       5           10    
3  5       5           10    
4  5       5           10    

这,为我们提供了一个具有相同索引的数据框,然后我们可以将其连接起来:

pd.concat([df,json_series],axis=1)

                                                 Ratings ReviewID Service  \
0  {'Service': '5', 'Cleanliness': '5', 'Overall': '10'}     12    5        
1  {'Service': '4', 'Cleanliness': '4', 'Overall': '10'}     54    4        
2  {'Service': '5', 'Cleanliness': '5', 'Overall': '10'}     48    5        
3  {'Service': '5', 'Cleanliness': '5', 'Overall': '10'}     90    5        
4  {'Service': '5', 'Cleanliness': '5', 'Overall': '10'}     75`   5        

  Cleanliness Overall  
0  5           10      
1  4           10      
2  5           10      
3  5           10      
4  5           10