我有以下格式的Pandas数据框。
[apple]
[banana]
[apple, orange]
我想对此进行转换,使其只有唯一的值,但每个值按行分割:
apple
banana
orange
答案 0 :(得分:2)
首先将您的列表unnest
排成行,然后使用drop_duplicates
:
Dim FSO As Object
Dim sh As Object, fld As Object, n As Object
Set FSO = CreateObject("Scripting.FileSystemObject")
Set sh = CreateObject("Shell.Application")
Set ZipFile = sh.Namespace("C:\Users\mohit.bansal\Desktop\Test\Test.zip")
For Each fileInZip In ZipFile.Items
Debug.Print (fileInZip)
Next
# Make example dataframe
df = pd.DataFrame({'Col1':[['apple'], ['banana'], ['apple', 'orange']]})
Col1
0 [apple]
1 [banana]
2 [apple, orange]
输出
df = explode_list(df, 'Col1').drop_duplicates()
链接答案中使用的功能
Col1
0 apple
1 banana
2 orange
答案 1 :(得分:2)
您可以使用itertools.chain
和from_iterable()
展平列表列表,并使用OrderedDict
删除重复的维护顺序:
from collections import OrderedDict
import itertools
df['col2']=OrderedDict.fromkeys(itertools.chain.from_iterable(df.col)).keys()
print(df)
col col2
0 [apple] apple
1 [banana] banana
2 [apple, orange] orange