我的列有超过800行,如下所示:
0 ['Overgrow', 'Chlorophyll']
1 ['Overgrow', 'Chlorophyll']
2 ['Overgrow', 'Chlorophyll']
3 ['Blaze', 'Solar Power']
4 ['Blaze', 'Solar Power']
5 ['Blaze', 'Solar Power']
6 ['Torrent', 'Rain Dish']
7 ['Torrent', 'Rain Dish']
8 ['Torrent', 'Rain Dish']
9 ['Shield Dust', 'Run Away']
10 ['Shed Skin']
11 ['Compoundeyes', 'Tinted Lens']
12 ['Shield Dust', 'Run Away']
13 ['Shed Skin']
14 ['Swarm', 'Sniper']
15 ['Keen Eye', 'Tangled Feet', 'Big Pecks']
16 ['Keen Eye', 'Tangled Feet', 'Big Pecks']
17 ['Keen Eye', 'Tangled Feet', 'Big Pecks']
以下是我为获得第二部分所做的工作:
list_ability = df_pokemon['abilities'].tolist()
new_list = []
for i in range(0, len(list_ability)):
m = re.findall(r"'(.*?)'", list_ability[i], re.DOTALL)
for j in range(0, len(m)):
new_list.append(m[j])
list1 = set(new_list)
我能够将唯一的字符串值放入列表中,但是有更好的方法吗?
'过度生长' - 3
'叶绿素' - 3
'Blaze' - 3
'Sheild Dust' - 2 ....等等
(顺便说一句,该列的名称来自数据框'abilities'
df_pokemon
。)
答案 0 :(得分:3)
由于值是字符串,你可以使用正则表达式和split来将它们转换为list然后使用itertools就像@JonClements在评论中提到的那样计算,即
from collections import Counter
count = pd.Series(df['abilities'].str.replace('[\[\]\']','').str.split(',').map(Counter).sum())
输出:
Big Pecks 3 Chlorophyll 3 Rain Dish 3 Run Away 2 Sniper 1 Solar Power 3 Tangled Feet 3 Tinted Lens 1 Blaze 3 Compoundeyes 1 Keen Eye 3 Overgrow 3 Shed Skin 2 Shield Dust 2 Swarm 1 Torrent 3 dtype: int64 dtype: int64
仅列出唯一值,然后count[count==1].index.tolist()
['Sniper', 'Tinted Lens', 'Compoundeyes', 'Swarm']
用于制作索引列表
count.index.tolist()
答案 1 :(得分:2)
使用value_counts
In [1845]: counts = pd.Series(np.concatenate(df_pokemon.abilities)).value_counts()
In [1846]: counts
Out[1846]:
Rain Dish 3
Keen Eye 3
Chlorophyll 3
Blaze 3
Solar Power 3
Overgrow 3
Big Pecks 3
Tangled Feet 3
Torrent 3
Shield Dust 2
Shed Skin 2
Run Away 2
Compoundeyes 1
Swarm 1
Tinted Lens 1
Sniper 1
dtype: int64
对于唯一值,您可以
In [1850]: counts.index.tolist()
Out[1850]:
['Rain Dish','Keen Eye', 'Chlorophyll', 'Blaze', 'Solar Power', 'Overgrow',
'Big Pecks', 'Tangled Feet', 'Torrent', 'Shield Dust', 'Shed Skin', 'Run Away',
'Compoundeyes', 'Swarm', 'Tinted Lens', 'Sniper']
或者,
In [1849]: np.unique(np.concatenate(df_pokemon.abilities))
Out[1849]:
array(['Big Pecks', 'Blaze', 'Chlorophyll', 'Compoundeyes', 'Keen Eye',
'Overgrow', 'Rain Dish', 'Run Away', 'Shed Skin', 'Shield Dust',
'Sniper', 'Solar Power', 'Swarm', 'Tangled Feet', 'Tinted Lens',
'Torrent'],
dtype='|S12')
注意 - 如果Jon's comments type(df_pokemon.abilities[0])
不是list
,则指向import ast
df_pokemon.abilities = df_pokemon.abilities.map(ast.literal_eval)
,然后转换为列表
In [1842]: df_pokemon
Out[1842]:
abilities
0 [Overgrow, Chlorophyll]
1 [Overgrow, Chlorophyll]
2 [Overgrow, Chlorophyll]
3 [Blaze, Solar Power]
4 [Blaze, Solar Power]
5 [Blaze, Solar Power]
6 [Torrent, Rain Dish]
7 [Torrent, Rain Dish]
8 [Torrent, Rain Dish]
9 [Shield Dust, Run Away]
10 [Shed Skin]
11 [Compoundeyes, Tinted Lens]
12 [Shield Dust, Run Away]
13 [Shed Skin]
14 [Swarm, Sniper]
15 [Keen Eye, Tangled Feet, Big Pecks]
16 [Keen Eye, Tangled Feet, Big Pecks]
17 [Keen Eye, Tangled Feet, Big Pecks]
In [1843]: df_pokemon.dtypes
Out[1843]:
abilities object
dtype: object
In [1844]: type(df_pokemon.abilities[0])
Out[1844]: list
详细
.bild {
opacity: 0.5;
transition: opacity .25s ease-in-out;
-moz-transition: opacity .25s ease-in-out;
-webkit-transition: opacity .25s ease-in-out;
}
.bild:hover {
opacity: 1;
}