如何在熊猫数据框中逐列删除重复项,以便:
// index.cshtml
@model ArtikelsViewModels
@if (Model.Id == 1003)
{
<h1>Common myths about DI</h1>
<p>
@Model.Artikel
</p>
}
// Artikel like that in the database is saved :
<ul>
<li>Di is only revleant for late binding.</li>
<li>Di is only revleant for unit testing.</li>
<li>Di is asort of abstract factory on steroids.</li>
<li>Di requiers a DI container</li>
<ul>
成为:
set1 set2 set3 set4
apple apple orange orange
apple orange banana orange
orange banana pear
banana banana lemon
pear lemon
grape lemon
答案 0 :(得分:3)
使用:
m=df.apply(lambda x:dict.fromkeys(x).keys())
pd.DataFrame(m.values.tolist(),index=m.index).T
或者以一种更好的方式来礼貌@piRSquared
:
pd.DataFrame.from_dict({k: {*df[k].dropna()} for k in df}, orient='index').T
set1 set2 set3 set4
0 apple apple orange orange
1 orange orange banana NaN
2 banana banana pear None
3 pear NaN lemon None
4 grape None None None
答案 1 :(得分:3)
itertools.zip_longest
from itertools import zip_longest
pd.DataFrame(
[*zip_longest(*({*df[c].dropna()} for c in df))],
columns=[*df]
)
set1 set2 set3 set4
0 banana orange banana orange
1 grape banana lemon None
2 pear apple pear None
3 apple None orange None
4 orange None None None
collections.defaultdict
和itertools.count
# %%timeit
from collections import defaultdict
from itertools import count
i = defaultdict(count)
pd.DataFrame({c: {next(i[c]): v for v in {*df[c].dropna()}} for c in df})
set1 set2 set3 set4
0 pear apple orange orange
1 grape banana lemon NaN
2 apple orange banana NaN
3 banana NaN pear NaN
4 orange NaN NaN NaN
答案 2 :(得分:3)
这是另一种pivot
df.melt().dropna().drop_duplicates(['variable','value']).\
assign(key=lambda x : x.groupby('variable').cumcount()).pivot(index='key',columns='variable',values='value')
Out[806]:
variable set1 set2 set3 set4
key
0 apple apple orange orange
1 orange orange banana NaN
2 banana banana pear NaN
3 pear NaN lemon NaN
4 grape NaN NaN NaN
答案 3 :(得分:1)
您也可以使用drop_duplicates
:
df.apply(lambda x : x.drop_duplicates().reset_index(drop=True))
>
set1 set2 set3 set4
0 apple apple orange orange
1 orange orange banana NaN
2 banana banana pear NaN
3 pear NaN lemon NaN
4 grape NaN NaN NaN