Pandas - 计算并从列中获取唯一的字符串值

时间:2017-10-14 10:46:37

标签: python regex pandas split

我的列有超过800行,如下所示:

0                            ['Overgrow', 'Chlorophyll']
1                            ['Overgrow', 'Chlorophyll']
2                            ['Overgrow', 'Chlorophyll']
3                               ['Blaze', 'Solar Power']
4                               ['Blaze', 'Solar Power']
5                               ['Blaze', 'Solar Power']
6                               ['Torrent', 'Rain Dish']
7                               ['Torrent', 'Rain Dish']
8                               ['Torrent', 'Rain Dish']
9                            ['Shield Dust', 'Run Away']
10                                         ['Shed Skin']
11                       ['Compoundeyes', 'Tinted Lens']
12                           ['Shield Dust', 'Run Away']
13                                         ['Shed Skin']
14                                   ['Swarm', 'Sniper']
15             ['Keen Eye', 'Tangled Feet', 'Big Pecks']
16             ['Keen Eye', 'Tangled Feet', 'Big Pecks']
17             ['Keen Eye', 'Tangled Feet', 'Big Pecks']

我想要什么?

  1. 我想计算每个字符串值发生的次数。
  2. 我还想将唯一的字符串值排列到列表中。
  3. 以下是我为获得第二部分所做的工作:

    list_ability = df_pokemon['abilities'].tolist()
    new_list = []
    for i in range(0, len(list_ability)):
        m = re.findall(r"'(.*?)'", list_ability[i], re.DOTALL)
        for j in range(0, len(m)):
            new_list.append(m[j])
    
    list1 = set(new_list)  
    

    我能够将唯一的字符串值放入列表中,但是有更好的方法吗?

    实施例

    '过度生长' - 3

    '叶绿素' - 3

    'Blaze' - 3

    'Sheild Dust' - 2 ....等等

    (顺便说一句,该列的名称来自数据框'abilities' df_pokemon。)

2 个答案:

答案 0 :(得分:3)

由于值是字符串,你可以使用正则表达式和split来将它们转换为list然后使用itertools就像@JonClements在评论中提到的那样计算,即

from collections import Counter
count  = pd.Series(df['abilities'].str.replace('[\[\]\']','').str.split(',').map(Counter).sum())

输出:

Big Pecks        3
Chlorophyll      3
Rain Dish        3
Run Away         2
Sniper           1
Solar Power      3
Tangled Feet     3
Tinted Lens      1
Blaze            3
Compoundeyes     1
Keen Eye         3
Overgrow         3
Shed Skin        2
Shield Dust      2
Swarm            1
Torrent          3
dtype: int64
dtype: int64

仅列出唯一值,然后count[count==1].index.tolist()

['Sniper', 'Tinted Lens', 'Compoundeyes', 'Swarm']

用于制作索引列表

count.index.tolist()

答案 1 :(得分:2)

使用value_counts

In [1845]: counts = pd.Series(np.concatenate(df_pokemon.abilities)).value_counts()

In [1846]: counts
Out[1846]:
Rain Dish       3
Keen Eye        3
Chlorophyll     3
Blaze           3
Solar Power     3
Overgrow        3
Big Pecks       3
Tangled Feet    3
Torrent         3
Shield Dust     2
Shed Skin       2
Run Away        2
Compoundeyes    1
Swarm           1
Tinted Lens     1
Sniper          1
dtype: int64

对于唯一值,您可以

In [1850]: counts.index.tolist()
Out[1850]:
['Rain Dish','Keen Eye', 'Chlorophyll', 'Blaze', 'Solar Power', 'Overgrow', 
 'Big Pecks', 'Tangled Feet', 'Torrent', 'Shield Dust', 'Shed Skin', 'Run Away',
 'Compoundeyes', 'Swarm', 'Tinted Lens', 'Sniper']

或者,

In [1849]: np.unique(np.concatenate(df_pokemon.abilities))
Out[1849]:
array(['Big Pecks', 'Blaze', 'Chlorophyll', 'Compoundeyes', 'Keen Eye',
       'Overgrow', 'Rain Dish', 'Run Away', 'Shed Skin', 'Shield Dust',
       'Sniper', 'Solar Power', 'Swarm', 'Tangled Feet', 'Tinted Lens',
       'Torrent'],
      dtype='|S12')

注意 - 如果Jon's comments type(df_pokemon.abilities[0])不是list,则指向import ast df_pokemon.abilities = df_pokemon.abilities.map(ast.literal_eval) ,然后转换为列表

In [1842]: df_pokemon
Out[1842]:
                              abilities
0               [Overgrow, Chlorophyll]
1               [Overgrow, Chlorophyll]
2               [Overgrow, Chlorophyll]
3                  [Blaze, Solar Power]
4                  [Blaze, Solar Power]
5                  [Blaze, Solar Power]
6                  [Torrent, Rain Dish]
7                  [Torrent, Rain Dish]
8                  [Torrent, Rain Dish]
9               [Shield Dust, Run Away]
10                          [Shed Skin]
11          [Compoundeyes, Tinted Lens]
12              [Shield Dust, Run Away]
13                          [Shed Skin]
14                      [Swarm, Sniper]
15  [Keen Eye, Tangled Feet, Big Pecks]
16  [Keen Eye, Tangled Feet, Big Pecks]
17  [Keen Eye, Tangled Feet, Big Pecks]

In [1843]: df_pokemon.dtypes
Out[1843]:
abilities    object
dtype: object

In [1844]: type(df_pokemon.abilities[0])
Out[1844]: list

详细

.bild {
   opacity: 0.5;
   transition: opacity .25s ease-in-out;
   -moz-transition: opacity .25s ease-in-out;
   -webkit-transition: opacity .25s ease-in-out;
   }

   .bild:hover {
      opacity: 1;
      }