我正在尝试让一个具有匹配ID的艺术家来制作音乐,这些音乐可以跨越各种奇异形式或流派组合。
这就是我想要做的
Artist | Id | Genre | Jazz | Blues | Rock | Trap | Rap | Hip-Hop | Pop | Rb |
----------------------------------------------------------------------------------------------------
Bob | 1 | [Jazz, Blues] | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0
----------------------------------------------------------------------------------------------------
Fred | 2 | [Rock,Jazz] | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0
----------------------------------------------------------------------------------------------------
Jeff | 3 | [Trap, Rap, Hip-Hop] | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0
----------------------------------------------------------------------------------------------------
Amy | 4 | [Pop, Rock, Jazz] | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0
----------------------------------------------------------------------------------------------------
Mary | 5 | [Hip-Hop, Jazz, Rb] | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1
----------------------------------------------------------------------------------------------------
这是我得到的错误
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-50-7a4ed81e14d7> in <module>
11 for index, row in artist_df.iterrows():
12 x.append(index)
---> 13 for i in row['genre']:
14 artists_with_genres.at[index, genre] = 1
15
TypeError: 'float' object is not iterable
这些(艺术家)类型是结合其他因素(例如年份,歌曲或人口统计信息)来帮助确定相似艺术家的属性。
我正在创建和迭代的新列将指定艺术家是否属于该类型。用1/0可以简单地表示艺术家是否是摇滚/嘻哈/陷阱等。使用属性的二进制表示形式。
这是当前数据框
检查数据框并将流派拆分为单个流派,以便我可以转换为1/0二进制表示形式。
我需要将流派设置为索引吗?
第1个这样的数据帧
Artist | Id | Genre |
--------------------------------------
Bob | 1 | Jazz | Blues
--------------------------------------
Fred | 2 | Rock | Jazz
--------------------------------------
Jeff | 3 | Trap | Rap | Hip-Hop
--------------------------------------
Amy | 4 | Pop | Rock | Jazz
--------------------------------------
Mary | 5 | Hip-Hop | Jazz | Rb
这就是我想要做的
Artist | Id | Genre | Jazz | Blues | Rock | Trap | Rap | Hip-Hop | Pop | Rb |
----------------------------------------------------------------------------------------------------
Bob | 1 | [Jazz, Blues] | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0
----------------------------------------------------------------------------------------------------
Fred | 2 | [Rock,Jazz] | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0
----------------------------------------------------------------------------------------------------
Jeff | 3 | [Trap, Rap, Hip-Hop] | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0
----------------------------------------------------------------------------------------------------
Amy | 4 | [Pop, Rock, Jazz] | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0
----------------------------------------------------------------------------------------------------
Mary | 5 | [Hip-Hop, Jazz, Rb] | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1
----------------------------------------------------------------------------------------------------
每种类型都由|分隔。因此我们只需要在|。
上调用split函数即可。[![artist_df\['genres'\] = artist_df.genres.str.split('|')
artist_df.head()][1]][1]
首先将df复制到df。
artists_with_genres = df.copy(deep=True)
然后遍历df,然后将演出者流派附加为1或0的列。
如果该列包含当前索引类型的艺术家,则为1,否则为0。
x = []
for index, row in artist_df.iterrows():
x.append(index)
for genre in row['genres']:
artists_with_genres.at[index, genre] = 1
**Confirm that every row has been iterated and acted upon.**
print(len(x) == len(artist_df))
artists_with_genres.head(30)
用0填充NaN值以表明艺术家没有该栏的类型。
artists_with_genres = artists_with_genres.fillna(0)
artists_with_genres.head(3)
答案 0 :(得分:4)
尝试使用get_dummies
:
df['Genre'] = df['Genre'].str.split('|')
dfx = pd.get_dummies(pd.DataFrame(df['Genre'].tolist()).stack()).sum(level=0)
df = pd.concat([df, dfx], axis=1).drop(columns=['Genre'])
print(df)
Artist Id Blues Hip-Hop Jazz Pop Rap Rb Rock Trap
0 Bob 1 1 0 1 0 0 0 0 0
1 Fred 2 0 0 1 0 0 0 1 0
2 Jeff 3 0 1 0 0 1 0 0 1
3 Amy 4 0 0 1 1 0 0 1 0
4 Mary 5 0 1 1 0 0 1 0 0
有关详细说明,请参见此处-> Pandas column of lists to separate columns