我正在尝试使用余弦相似度算法在Python中构建一个简单的歌手推荐系统。我正在使用的数据集是last.fm数据集-https://www.kaggle.com/neferfufi/lastfm
我一直关注https://www.benfrederickson.com/distance-metrics/上的博客 而且我尝试编写类似的代码。
import pandas as pd
import numpy as np
from numpy import zeros
from collections import defaultdict
from scipy.sparse import csr_matrix
import keras
from keras.layers import dot
url_data = pd.read_csv("stuff.tsv",
usecols=[0, 2, 3],
names=['user', 'artist', 'plays'])
userids = defaultdict(lambda: len(userids))
url_data['userid'] = url_data['user'].map(userids.__getitem__)
artists = dict((artist, csr_matrix(
(group['plays'], (zeros(len(group)), group['userid'])),
shape=[1, len(userids)]))
for artist, group in data.groupby('artist'))
SMOOTHING = 20
def newSmoothcosine(a, b):
overlap = dot(binarize(a), binarize(b).T)[0, 0]
# smooth cosine by discounting by set intersection
return (overlap / (SMOOTHING + overlap)) * cosine(a, b)
def binarize(artist):
ret = csr_matrix(artist)
ret.data = ones(len(artist.data))
return ret
print(newSmoothcosine('Kanye West', 'Jay-Z'))
我希望它返回两位艺术家之间角度的平滑余弦值,但我得到
TypeError: no supported conversion for types: (dtype('<U10'),)
请帮忙!
答案 0 :(得分:0)
这是一个解决方案,我不知道它是否行得通,但是您可以尝试使用lambda来转换dtype float
df.apply(lambda x: x.replace('$', '').replace(',', '')).astype('float')