我具有以下格式的数据(每个字典包含3个列表的列表):
[{40258: [['2018-07-03T14:13:41'], ['Open'], ['Closed']]},
{40257: [['2018-07-03T13:47:55',
'2018-07-03T14:21:52',
'2018-07-04T11:56:44'],
['Open', 'In Progress', 'Waiting on 3rd Party'],
['In Progress', 'Waiting on 3rd Party', 'In Progress']]},
{40255: [['2018-07-03T13:12:58'], ['Open'], ['Closed']]},
{40250: [[], [], []]}]
我希望将以上内容转换为以下df:
key List1-1 List1-2 List1-3 List2-1 List2-2 List2-3 List3-1 List3-2 List3-3
40258 2018-07-03T14:13:41 nan nan 'Open' nan nan 'Closed' nan nan
40257 2018-07-03T13:47:55 2018-07-03T14:21:52 2018-07-04T11:56:44 'Open' 'In Progress' 'Waiting on 3rd Party' 'In Progress' 'Waiting on 3rd Party' 'In Progress'
40255 2018-07-03T13:12:58 nan nan 'Open' nan nan 'Closed' nan nan
40250 nan nan nan nan nan nan nan nan nan
我尝试使用普通的pd.DataFrame
和pd.DataFrame.from_dict
,但是找不到在字典中处理多个列表的解决方案。
非常感谢您的帮助。
答案 0 :(得分:3)
data=[{40258: [['2018-07-03T14:13:41'], ['Open'], ['Closed']]},
{40257: [['2018-07-03T13:47:55',
'2018-07-03T14:21:52',
'2018-07-04T11:56:44'],
['Open', 'In Progress', 'Waiting on 3rd Party'],
['In Progress', 'Waiting on 3rd Party', 'In Progress']]},
{40255: [['2018-07-03T13:12:58'], ['Open'], ['Closed']]},
{40250: [[], [], []]}]
f = lambda x: x + [np.nan]*(3-len(x))
mod_data = [ [k]+ sum(list(map(f, v)), []) for d in data for k,v in d.items()]
cols = ['key', 'List1-1', 'List1-2', 'List1-3', 'List2-1', 'List2-2', 'List2-3', 'List3-1', 'List3-2', 'List3-3']
df = pd.DataFrame(mod_data, columns=cols).set_index('key')
print(df)
输出
List1-1 List1-2 List1-3 List2-1 List2-2 List2-3 List3-1 List3-2 List3-3
key
40258 2018-07-03T14:13:41 NaN NaN Open NaN NaN Closed NaN NaN
40257 2018-07-03T13:47:55 2018-07-03T14:21:52 2018-07-04T11:56:44 Open In Progress Waiting on 3rd Party In Progress Waiting on 3rd Party In Progress
40255 2018-07-03T13:12:58 NaN NaN Open NaN NaN Closed NaN NaN
40250 NaN NaN NaN NaN NaN NaN NaN NaN NaN
答案 1 :(得分:2)
创建列表列表,然后使用pd.dataFrame(data,columns)创建df似乎是最简单的选择。
# First calculate the length of maximum list in the dictionary
# Let that be lmax
data = []
for elem in dict :
for key in elem : # Note that only one key is there
lst = elem[key] # z is the list
data_curr = [np.nan] * (3*len(lmax) + 1)
data_curr[0] = elem
l = len(lst[0])
for i in range(0,l) :
data_curr[3*i+1] = z[0][i]
data_curr[3*i+2] = z[1][i]
data_curr[3*i+3] = z[2][i]
data.append(data_curr]
columns = ['key','List1-1,List1-2','List1-3','List2-1','List2-2','List2-3','List3-1','List3-2','List3-3']
df = pd.DataFrame(data,columns=columns)
希望您能理解
答案 2 :(得分:1)
想通了我仍然分享我的解决方案:
from numpy import nan
mess = [{40258: [['2018-07-03T14:13:41'], ['Open'], ['Closed']]},
{40257: [['2018-07-03T13:47:55',
'2018-07-03T14:21:52',
'2018-07-04T11:56:44'],
['Open', 'In Progress', 'Waiting on 3rd Party'],
['In Progress', 'Waiting on 3rd Party', 'In Progress']]},
{40255: [['2018-07-03T13:12:58'], ['Open'], ['Closed']]},
{40250: [[], [], []]}]
master = dict()
for dicto in mess:
key = list(dicto.keys())[0]
master[key] = {('List{}-{}'.format(j+1,i+1)): (dicto[key][j][i] if i < len(dicto[key][j]) else nan ) for i in range(3) for j in range(3)}
output = pd.DataFrame.from_records(master, columns=list(master.keys())).T
print(output.to_string())
输出:
List1-1 List1-2 List1-3 List2-1 List2-2 List2-3 List3-1 List3-2 List3-3
40258 2018-07-03T14:13:41 NaN NaN Open NaN NaN Closed NaN NaN
40257 2018-07-03T13:47:55 2018-07-03T14:21:52 2018-07-04T11:56:44 Open In Progress Waiting on 3rd Party In Progress Waiting on 3rd Party In Progress
40255 2018-07-03T13:12:58 NaN NaN Open NaN NaN Closed NaN NaN
40250 NaN NaN NaN NaN NaN NaN NaN NaN NaN