我有下面给出的数据框。我想从元组列表中提取第一个列表,然后将提取到列表中的列表转置。
data = {'Document_No':[0.0,1.0], 'list_of_topics': [
([(0, 0.14572892),
(1, 0.014889247),
(11, 0.44593897)],
[(4, [0]), (5, [4]), (6, [11]), (7, [11]), (8, [11, 4]), (9, [11, 4])],
[(4, [(0, 0.9999998)]),
(7, [(11, 0.9999998)]),
(9, [(4, 0.05520946), (11, 0.93936676)])]),
([(0, 0.2453892),
(11, 0.78657897)],
[(4, [0]), (5, [4]), (6, [11]), (7, [11]), (8, [11, 4]), (9, [11, 4])],
[(4, [(0, 0.9999998)]),
(7, [(11, 0.9999998)]),
(9, [(4, 0.05520946), (11, 0.93936676)])])
]}
df = pd.DataFrame(data)
所需结果:
Document_No 0 1 11
0 0.0 0.14572892 0.014889247 0.44593897
1 1.0 0.2453892 0 0.78657897
我的解决方案:
pd.DataFrame([[j[0] for j in i] for i in df['list_of_topics']], index=df['Document_No']).transpose()
Out[245]:
Document_No 0.0 1.0
0 (0, 0.14572892) (0, 0.14572892)
1 (4, [0]) (4, [0])
2 (4, [(0, 0.9999998)]) (4, [(0, 0.9999998)])
没有得到想要的结果。谁能帮我找出我在哪里做错了。
答案 0 :(得分:1)
您可以在列中选择所需的元组,并使用正则表达式提取数据
df1 = pd.DataFrame.from_records(df.list_of_topics[0])
for tup in df.list_of_topics[1:]:
df1 = df1.merge(pd.DataFrame.from_records(tup),on=0,how='outer')
df1.set_index(0,inplace=True)
df1.T.reset_index(drop=True)
出局:
0 1 11
0 0.145729 0.014889 0.445939
1 0.245389 NaN 0.786579