Question

我有两个数据帧-一个包含原始数据，另一个包含原始数据的映射/分类器。我想遍历原始数据并返回基于另一个的分类。

df =

Artist  Genres  Image   Popularity  Followers       Americana   Around the World    BritRock    ... Pops    Post-Punk / Angular Progressive Psych'  Punky   Shoegazer / Dreamer Soul / Funk Soundtracks Younger Rap Younget Indie
0   0   Buke and Gase   [brooklyn indie, deep indie rock]   https://i.scdn.co/image/eece57650f99d1265f871a...   32  9328                    ...                                     
1   0   Bright Light Bright Light   [austindie] https://i.scdn.co/image/5234fdee902fe1d4d5ad20...   39  23153                   ...                                     
2   0   Angelo De Augustine [preverb, small room]   https://i.scdn.co/image/3080e9d856e639d539804b...   45  6393                    ...                                     
3   0   Modeselektor    [alternative dance, electronic, indietronica, ...   https://i.scdn.co/image/1bf7a85bcc0c047d8914a2...   50  120084                  ...                                     
4   0   Razorlight  [britpop, garage rock, indie rock, modern rock...   https://i.scdn.co/image/b743a5f809f671be6a60f7...   63  252969                  ...                                     
5 rows × 33 columns

分类器：

spotify_genre   class_one
0   21st century classical  Peaceful Music
1   abstract    Conscious Hip-Hop
2   abstract hip hop    Conscious Hip-Hop
3   abstractro  Experimental / Noise / Drone
4   acid house  Mature Dance Head

我想对df ['Genres']进行迭代，并且在任何字符串与classifier ['spotify_genre']发生部分匹配的情况下，它都应按classifier ['class_one]的决定，向df中的必要列返回1的计数'] 例如，Buke和Gase的流派类型为“布鲁克林独立”，应在原始df中为“年轻独立”栏返回“ 1”。

我尝试了多种方法来解决问题，但找不到最佳的方法。

Answer 1

我将分两步执行此操作。首先使用字典将值映射到原始df中的单个列中：

df['class'] = df['Genres'].map(dict)

dict是以下格式的字典

dict = {'abstract': 'Conscious Hip-Hop', 'abstract hip hop': 'Conscious Hip-Hop', #...

然后，您可以在pandas.get_dummies()上使用df['class']来获取所需的所有列。

根据另一个使用字符串包含对熊猫数据框进行分类

1 个答案: