我正在尝试根据条件在数据帧中创建重复的行。
例如,我有这个数据框。
team student
a Ursula
b Hayfa, Martin
c Kato
d Tanek, Ava, Pyto
e Aiko
f Hunter
g Josiah, Derek, Uma, Nell
所需的输出:
team student name remark
a Ursula Ursula
b Hayfa, Martin Hayfa with Martin
b Hayfa, Martin Martin with Hayfa
c Kato Kato
d Tanek, Ava, Pyto Tanek with Ava, Pyto
d Tanek, Ava, Pyto Ava with Tanek, Pyto
d Tanek, Ava, Pyto Pyto with Tanek, Ava
e Aiko Aiko
f Hunter Hunter
g Josiah, Derek, Uma, Nell Josiah with Derek, Uma, Nell
g Josiah, Derek, Uma, Nell Derek with Josiah, Uma, Nell
g Josiah, Derek, Uma, Nell Uma with Josiah, Derek, Nell
g Josiah, Derek, Uma, Nell Nell with Josiah, Derek, Uma
答案 0 :(得分:0)
对于大熊猫0.25+,可以将DataFrame.explode
与Series.str.split
分开,对于remark
带有过滤功能的列列表理解:
s = df['student'].str.split(', ')
df = df.assign(name= s, remark = s).explode('name').reset_index(drop=True)
df['remark'] = ['with ' + ', '.join(x for x in b if x != a)
if len(b) > 1
else ''
for a, b in zip(df['name'], df['remark'])]
print (df)
team student name remark
0 a Ursula Ursula
1 b Hayfa, Martin Hayfa with Martin
2 b Hayfa, Martin Martin with Hayfa
3 c Kato Kato
4 d Tanek, Ava, Pyto Tanek with Ava, Pyto
5 d Tanek, Ava, Pyto Ava with Tanek, Pyto
6 d Tanek, Ava, Pyto Pyto with Tanek, Ava
7 e Aiko Aiko
8 f Hunter Hunter
9 g Josiah, Derek, Uma, Nell Josiah with Derek, Uma, Nell
10 g Josiah, Derek, Uma, Nell Derek with Josiah, Uma, Nell
11 g Josiah, Derek, Uma, Nell Uma with Josiah, Derek, Nell
12 g Josiah, Derek, Uma, Nell Nell with Josiah, Derek, Uma