我正在尝试将以下数据加载到pandas
dataframe
:
jsons_data = pd.DataFrame(columns=['playlist', 'user', 'track', 'count'])
for index, js in enumerate(json_files):
with open(os.path.join(path_to_json, js)) as json_file:
json_text = json.load(json_file)
#my json layout
user = json_text.keys()
playlist = 'all_playlists'
track = [p for p in json_text.values()[0]]
count = [p.values() for p in json_text.values()]
print jsons_data
但我得到empty dataframe
:
[u'user1']
all_playlists
[{u'Make You Feel My Love': 1.0, u'I See Fire': 1.0, u'High And Dry': 1.0, u'Fake Plastic Trees': 1.0, u'One': 1.0, u'Goodbye My Lover': 1.0, u'No Surprises': 1.0}]
[[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]]
[u'user2']
all_playlists
[{u'Codex': 1.0, u'No Surprises': 1.0, u'O': 1.0, u'Go It Alone': 1.0}]
[[1.0, 1.0, 1.0, 1.0]]
[u'user3']
all_playlists
[{u'Fake Plastic Trees': 1.0, u'High And Dry': 1.0, u'No Surprises': 1.0}]
[[1.0, 1.0, 1.0]]
[u'user4']
all_playlists
[{u'No Distance Left To Run': 1.0, u'Running Up That Hill': 1.0, u'Fake Plastic Trees': 1.0, u'The Numbers': 1.0, u'No Surprises': 1.0}]
[[1.0, 1.0, 1.0, 1.0, 1.0]]
[u'user5']
all_playlists
[{u'Wild Wood': 1.0, u'You Do Something To Me': 1.0, u'Reprise': 1.0}]
[[1.0, 1.0, 1.0]]
Empty DataFrame
Columns: [playlist, user, track, count]
Index: []
代码有什么问题?
编辑:
json
文件以这种方式构建:
{
'user1':{
'Karma Police':1.0,
'Roxanne':1.0,
'Sonnet':1.0,
'We Will Rock You':1.0,
}}
答案 0 :(得分:1)
好的,首先让我们开始制作一些假数据,这样可以更容易理解这个问题:
# Dummy data to play with
data1 = {
'user1':{
'Karma Police':1.0,
'Roxanne':1.0,
'Sonnet':1.0,
'We Will Rock You':1.0,
}
}
data2 = {
'user2':{
'Karma Police':1.0,
'Creep':1.0,
}
}
让我举例说明我们将在下面使用的内容:
In : pd.DataFrame(data1).unstack()
Out:
user1 Karma Police 1.0
Roxanne 1.0
Sonnet 1.0
We Will Rock You 1.0
dtype: float64
# This is where you would normally iterate on the files
mylist = []
for data in [data1, data2]:
# Make a dataframe then unstack,
# producing a series with a 2-multiindex as above
# And append it to the lsit
mylist.append(pd.DataFrame(data).unstack())
现在我们将该列表汇总,然后进行一些清理
merged = pd.concat(mylist)
# Renaming to get the right column names
merged.index.names = ['User', 'Track']
merged.name = 'Count'
# Transpose to a dataframe instead of a Series
merged = merged.to_frame()
# Adding a new column with the same value throughout
merged['Playlist'] = 'all_playlists'
merged
输出:
如果你不喜欢这样,你可以调用reset_index。
答案 1 :(得分:0)
在循环结束时,只需添加:
jsons_data.loc[index] = [playlist, user, track, count]
打印:
playlist user \
0 decaf [user1]
1 decaf [user2]
2 decaf [user3]
3 decaf [user4]
4 decaf [user5]
track \
0 [Make You Feel My Love, I See Fire, High And D...
1 [Codex, No Surprises, O, Go It Alone]
2 [Fake Plastic Trees, High And Dry, No Surprises]
3 [No Distance Left To Run, Running Up That Hill...
4 [Wild Wood, You Do Something To Me, Reprise]
count
0 [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]]
1 [[1.0, 1.0, 1.0, 1.0]]
2 [[1.0, 1.0, 1.0]]
3 [[1.0, 1.0, 1.0, 1.0, 1.0]]
4 [[1.0, 1.0, 1.0]]