如何修复TypeError:类型不支持转换:(dtype('<U10'),)

时间:2019-07-12 08:03:21

标签: python pandas numpy scipy

我正在尝试使用余弦相似度算法在Python中构建一个简单的歌手推荐系统。我正在使用的数据集是last.fm数据集-https://www.kaggle.com/neferfufi/lastfm

我一直关注https://www.benfrederickson.com/distance-metrics/上的博客 而且我尝试编写类似的代码。

import pandas as pd
import numpy as np
from numpy import zeros
from collections import defaultdict
from scipy.sparse import csr_matrix
import keras
from keras.layers import dot
url_data = pd.read_csv("stuff.tsv", 
                         usecols=[0, 2, 3], 
                         names=['user', 'artist', 'plays'])

userids = defaultdict(lambda: len(userids))
url_data['userid'] = url_data['user'].map(userids.__getitem__)

artists = dict((artist, csr_matrix(
                (group['plays'], (zeros(len(group)), group['userid'])),
                shape=[1, len(userids)]))
        for artist, group in data.groupby('artist'))

SMOOTHING = 20

def newSmoothcosine(a, b):
    overlap = dot(binarize(a), binarize(b).T)[0, 0]

    # smooth cosine by discounting by set intersection
    return (overlap / (SMOOTHING + overlap)) * cosine(a, b)

def binarize(artist):
    ret = csr_matrix(artist)
    ret.data  = ones(len(artist.data))
    return ret

print(newSmoothcosine('Kanye West', 'Jay-Z'))

我希望它返回两位艺术家之间角度的平滑余弦值,但我得到

TypeError: no supported conversion for types: (dtype('<U10'),)

请帮忙!

1 个答案:

答案 0 :(得分:0)

这是一个解决方案,我不知道它是否行得通,但是您可以尝试使用lambda来转换dtype float df.apply(lambda x: x.replace('$', '').replace(',', '')).astype('float')