我试图将4维列表转换为pandas数据帧。我有一个使用三重嵌套for
循环来实现这一目标的解决方案,但它是高度未经优化的 - 我觉得必须有一个更快的解决方案。我一直在使用的代码如下:
import pandas as pd
master_df = pd.DataFrame(columns=('a1', 'a2', 'intersection', 'similarity'))
for i in master_list[0:2]:
for x in i:
for y in x:
t = [y[0], y[1], repr(y[2]), y[3]]
master_df.loc[-1] = t
master_df.index = master_df.index + 1
master_df = master_df.sort_index()
这是master_list
我尝试插入数据框的一小部分。
master_list = [[[['residential property 42 holywell hill st. albans east of england al1 1bx',
'gnd flr 38 holywell hill st albans herts al1 1bx',
{'1bx', 'al1', 'albans', 'hill', 'holywell'},
0.5809767086589066],
['residential property 42 holywell hill st. albans east of england al1 1bx',
'62 holywell hill st albans herts al1 1bx',
{'1bx', 'al1', 'albans', 'hill', 'holywell'},
0.62250400597525191]]],
[[['aitchisons 2 holywell hill st. albans east of england al1 1bz',
'22 holywell hill st albans herts al1 1bz',
{'1bz', 'al1', 'albans', 'hill', 'holywell'},
0.64696827426453596],
['aitchisons 2 holywell hill st. albans east of england al1 1bz',
'24 holywell hill st albans herts al1 1bz',
{'1bz', 'al1', 'albans', 'hill', 'holywell'},
0.64660269146725069],
['aitchisons 2 holywell hill st. albans east of england al1 1bz',
'26 holywell hill st albans herts al1 1bz',
{'1bz', 'al1', 'albans', 'hill', 'holywell'},
0.64617599950794757],
['aitchisons 2 holywell hill st. albans east of england al1 1bz',
'20 holywell hill st albans herts al1 1bz',
{'1bz', 'al1', 'albans', 'hill', 'holywell'},
0.64798547824947428]]]]
有没有人建议将这个4d列表以更加... pythonic的方式转换为pandas数据帧?
萨姆
答案 0 :(得分:3)
这是一个解决方案:
代码:
def flatten(container):
for i in container:
if isinstance(i, (list,tuple)):
for j in flatten(i):
yield j
else:
yield i
def fix_dict(x):
return repr(x) if isinstance(x, dict) else x
all_values = list(flatten(master_list))
all_values = [fix_dict(val) for val in all_values]
master_df = pd.DataFrame(np.reshape(all_values, (-1, 4)), columns = ['a1', 'a2', 'intersection', 'similarity'])
这给出了预期的输出。