熊猫 - 空数据帧

时间:2016-11-07 22:47:08

标签: python pandas

我正在尝试将以下数据加载到pandas dataframe

jsons_data = pd.DataFrame(columns=['playlist', 'user', 'track', 'count'])

for index, js in enumerate(json_files):
  with open(os.path.join(path_to_json, js)) as json_file:
    json_text = json.load(json_file)
    #my json layout
    user = json_text.keys()
    playlist = 'all_playlists'
    track = [p for p in json_text.values()[0]]
    count = [p.values() for p in json_text.values()]
    print jsons_data

但我得到empty dataframe

[u'user1']
all_playlists
[{u'Make You Feel My Love': 1.0, u'I See Fire': 1.0, u'High And Dry': 1.0, u'Fake Plastic Trees': 1.0, u'One': 1.0, u'Goodbye My Lover': 1.0, u'No Surprises': 1.0}]
[[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]]
[u'user2']
all_playlists
[{u'Codex': 1.0, u'No Surprises': 1.0, u'O': 1.0, u'Go It Alone': 1.0}]
[[1.0, 1.0, 1.0, 1.0]]
[u'user3']
all_playlists
[{u'Fake Plastic Trees': 1.0, u'High And Dry': 1.0, u'No Surprises': 1.0}]
[[1.0, 1.0, 1.0]]
[u'user4']
all_playlists
[{u'No Distance Left To Run': 1.0, u'Running Up That Hill': 1.0, u'Fake Plastic Trees': 1.0, u'The Numbers': 1.0, u'No Surprises': 1.0}]
[[1.0, 1.0, 1.0, 1.0, 1.0]]
[u'user5']
all_playlists
[{u'Wild Wood': 1.0, u'You Do Something To Me': 1.0, u'Reprise': 1.0}]
[[1.0, 1.0, 1.0]]
Empty DataFrame
Columns: [playlist, user, track, count]
Index: []

代码有什么问题?

编辑:

json文件以这种方式构建:

{
'user1':{
'Karma Police':1.0,
'Roxanne':1.0,
'Sonnet':1.0,
'We Will Rock You':1.0,
}}

2 个答案:

答案 0 :(得分:1)

好的,首先让我们开始制作一些假数据,这样可以更容易理解这个问题:

# Dummy data to play with
data1 = {
'user1':{
    'Karma Police':1.0,
    'Roxanne':1.0,
    'Sonnet':1.0,
    'We Will Rock You':1.0,
    }
}

data2 = {
'user2':{
    'Karma Police':1.0,
    'Creep':1.0,
    }
}

让我举例说明我们将在下面使用的内容:

In : pd.DataFrame(data1).unstack()

Out:
user1  Karma Police        1.0
       Roxanne             1.0
       Sonnet              1.0
       We Will Rock You    1.0
dtype: float64
# This is where you would normally iterate on the files
mylist = []
for data in [data1, data2]:
    # Make a dataframe then unstack,
    # producing a series with a 2-multiindex as above
    # And append it to the lsit
    mylist.append(pd.DataFrame(data).unstack())

现在我们将该列表汇总,然后进行一些清理

merged = pd.concat(mylist)
# Renaming to get the right column names
merged.index.names = ['User', 'Track']
merged.name = 'Count'
# Transpose to a dataframe instead of a Series
merged = merged.to_frame()
# Adding a new column with the same value throughout
merged['Playlist'] = 'all_playlists'


merged

输出:

Output

如果你不喜欢这样,你可以调用reset_index。

答案 1 :(得分:0)

在循环结束时,只需添加:

jsons_data.loc[index] = [playlist, user, track, count]

打印:

  playlist                  user  \
0    decaf           [user1]   
1    decaf           [user2]   
2    decaf           [user3]   
3    decaf           [user4]   
4    decaf           [user5]   

                                               track  \
0  [Make You Feel My Love, I See Fire, High And D...   
1              [Codex, No Surprises, O, Go It Alone]   
2   [Fake Plastic Trees, High And Dry, No Surprises]   
3  [No Distance Left To Run, Running Up That Hill...   
4       [Wild Wood, You Do Something To Me, Reprise]   

                                   count  
0  [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]]  
1                 [[1.0, 1.0, 1.0, 1.0]]  
2                      [[1.0, 1.0, 1.0]]  
3            [[1.0, 1.0, 1.0, 1.0, 1.0]]  
4                      [[1.0, 1.0, 1.0]]