我有一个像这样的pandas数据框:
title author year type
0 t1 a1 1980 article
1 t2 ['a2', 'a3', 'a4'] 1983 article
2 t3 a5 1982 article
3 t4 a6 1977 article
4 t5 ['a7','a8'] 2011 book
这是一个简短的例子,原件更大。
我需要一个像这样的数据框:
title author year type
0 t1 a1 1980 article
1 t2 a2 1983 article
2 t2 a3 1983 article
3 t2 a4 1983 article
4 t3 a5 1982 article
5 t4 a6 1977 article
6 t5 a7 2011 book
7 t5 a8 2011 book
请注意,列表具有不同数量的元素
答案 0 :(得分:1)
#Expand the list of authors to separate rows and build a authors df
df_author = df.author.apply(pd.Series).stack().rename('author').reset_index()
#join the authors df to the original df
pd.merge(df_author,df,left_on='level_0',right_index=True, suffixes=(['','_old']))[df.columns]
Out[184]:
title author year type
0 t1 a1 1980 article
1 t2 a2 1983 article
2 t2 a3 1983 article
3 t2 a4 1983 article
4 t3 a5 1982 article
5 t4 a6 1977 article
6 t5 a7 2011 article