我有一个pandas数据帧,第一列是列表值。我想循环每个列表的每个str值,并且下一列的值将包含在一起。
例如:
tm = pd.DataFrame({'author':[['author_a1','author_a2','author_a3'],['author_b1','author_b2'],['author_c1','author_c2']],'journal':['journal01','journal02','journal03'],'date':pd.date_range('2015-02-03',periods=3)})
tm
author date journal
0 [author_a1, author_a2, author_a3] 2015-02-03 journal01
1 [author_b1, author_b2] 2015-02-04 journal02
2 [author_c1, author_c2] 2015-02-05 journal03
我想要这个:
author date journal
0 author_a1 2015-02-03 journal01
1 author_a2 2015-02-03 journal01
2 author_a3 2015-02-03 journal01
3 author_b1 2015-02-04 journal02
4 author_b2 2015-02-04 journal02
5 author_c1 2015-02-05 journal03
6 author_c2 2015-02-05 journal03
我使用了一种复杂的方法来解决问题。使用pandas有没有简单有效的方法?
author_use = []
date_use = []
journal_use = []
for i in range(0,len(tm['author'])):
for m in range(0,len(tm['author'][i])):
author_use.append(tm['author'][i][m])
date_use.append(tm['date'][i])
journal_use.append(tm['journal'][i])
df_author = pd.DataFrame({'author':author_use,
'date':date_use,
'journal':journal_use,
})
df_author
答案 0 :(得分:2)
我认为您可以numpy.repeat
使用str.len
来表示重复值,lists
可以使用嵌套chain
的平面值:
from itertools import chain
lens = tm.author.str.len()
df = pd.DataFrame({
"date": np.repeat(tm.date.values, lens),
"journal": np.repeat(tm.journal.values,lens),
"author": list(chain.from_iterable(tm.author))})
print (df)
author date journal
0 author_a1 2015-02-03 journal01
1 author_a2 2015-02-03 journal01
2 author_a3 2015-02-03 journal01
3 author_b1 2015-02-04 journal02
4 author_b2 2015-02-04 journal02
5 author_c1 2015-02-05 journal03
6 author_c2 2015-02-05 journal03
另一个numpy
解决方案:
df = pd.DataFrame(np.column_stack((tm[['date','journal']].values.\
repeat(list(map(len,tm.author)),axis=0) ,np.hstack(tm.author))),
columns=['date','journal','author'])
print (df)
date journal author
0 2015-02-03 00:00:00 journal01 auther_a1
1 2015-02-03 00:00:00 journal01 auther_a2
2 2015-02-03 00:00:00 journal01 auther_a3
3 2015-02-04 00:00:00 journal02 auther_b1
4 2015-02-04 00:00:00 journal02 auther_b2
5 2015-02-05 00:00:00 journal03 auther_c1
6 2015-02-05 00:00:00 journal03 auther_c2