我正在处理的df看起来像这样
co1 col2
A ['1','2','er']
A []
B ['1','3','4','abc']
B ['5']
C []
我想为col1中的每个值计算col2中列表中每个元素的百分比。即 为A计算1的百分比,为A计算2的百分比,为B计算abc的百分比 我正在寻找一种解决方案来迭代地执行此操作。谢谢
输入数据的链接(爆炸前)-[https://drive.google.com/file/d/1fuOBo8PK1heAtfufBlplXXfh4FiLpBCD/view?usp=sharing][1]
爆炸后输出的链接-[https://drive.google.com/file/d/1mcArrsu3TWJC6hYZ2kIHAkAzCaHd1DLH/view?usp=sharing][2]
答案 0 :(得分:2)
我相信您需要DataFrame.explode
和DataFrame.dropna
:
#changed data for better sample
print (df)
col1 col2
0 A [1, 2, 1]
1 A []
2 B [3, abc, abc]
3 B [abc]
4 C []
df2 = df.explode('col2').dropna(subset=['col2'])
print (df2)
col1 col2
0 A 1
0 A 2
0 A 1
2 B 3
2 B abc
2 B abc
3 B abc
然后是SeriesGroupBy.value_counts
:
df2 = df2.groupby('col1')['col2'].value_counts(normalize=True).reset_index(name='%')
print (df2)
col1 col2 %
0 A 1 0.666667
1 A 2 0.333333
2 B abc 0.750000
3 B 3 0.250000
编辑:
import ast
df = pd.read_csv('beforeexplode.csv')
df['col2'] = df['col2'].apply(ast.literal_eval)
df2 = df.explode('col2').dropna(subset=['col2'])
print (df2)
col1 col2
0 dev1 android
1 dev1 android
2 dev3 oscp
2 dev3 gpen
2 dev3 ceh
.. ... ...
206 dev2 wcag
207 dev2 linux
207 dev2 unix
208 dev2 linux
208 dev2 unix
[460 rows x 2 columns]