我有一个这样的数据框:
Text Mail Phone
text_1 mail_1,mail_2,mail_3 ['phone_1', 'phone_2']
text_2 mail_4,mail_5 ['phone_3', 'phone_4']
text_3 mail_6, mail_7,mail_8 ['phone_5']
. . .
text_n mail_x ['phone_y', 'phone_y+1']
. . .
我想获得一个像这样的数据框:
Text Mail Phone
text1 mail_1 phone_1
text1 mail_2 phone_2
text1 mail_3 ?
text2 mail_4 phone_3
text2 mail_5 phone_4
text3 mail_6 phone_5
text3 mail_7 ?
text3 mail_8 ?
text_n mail_x phone_y
text_n ? phone_y+1
对于初始数据帧的每一行,邮件和电话的数量是可变的,可以为0。
此致
答案 0 :(得分:1)
使用itertools中的zip_longest()
并重建DF:
from itertools import zip_longest
df_new = pd.DataFrame([
[t, m, p] for t,M,P in df.values
for m,p in zip_longest(M.split(','),P)
], columns=df.columns)
df_new.fillna('?', inplace=True)
#In [x]: df_new
#Out[x]:
# Text Mail Phone
#0 text_1 mail_1 phone_1
#1 text_1 mail_2 phone_2
#2 text_1 mail_3 ?
#3 text_2 mail_4 phone_3
#4 text_2 mail_5 phone_4
#5 text_3 mail_6 phone_5
#6 text_3 mail_7 ?
#7 text_3 mail_8 ?
#8 text_n mail_x phone_y
#9 text_n ? phone_y+1