我需要创建一个用户项目矩阵,其值是用户对企业的投票(1-5),如果用户未对企业投票,则为0。 我的csv文件的前20行如下:
user_id business_id stars
0 bv2nCi5Qv5vroFiqKGopiw AEx2SYEUJmTxVVB18LlCwA 5
1 bv2nCi5Qv5vroFiqKGopiw VR6GpWIda3SfvPC-lg9H3w 5
2 bv2nCi5Qv5vroFiqKGopiw CKC0-MOWMqoeWf6s-szl8g 5
3 bv2nCi5Qv5vroFiqKGopiw ACFtxLv8pGrrxMm6EgjreA 4
4 bv2nCi5Qv5vroFiqKGopiw s2I_Ni76bjJNK9yG60iD-Q 4
5 _4iMDXbXZ1p1ONG297YEAQ 8QWPlVQ6D-OExqXoaD2Z1g 5
6 u0LXt3Uea_GidxRW1xcsfg 9_CGhHMz8698M9-PkVf0CQ 4
7 u0LXt3Uea_GidxRW1xcsfg gkCorLgPyQLsptTHalL61g 4
8 u0LXt3Uea_GidxRW1xcsfg 5r6-G9C4YLbC7Ziz57l3rQ 3
9 u0LXt3Uea_GidxRW1xcsfg fDF_o2JPU8BR1Gya--jRIA 5
10 u0LXt3Uea_GidxRW1xcsfg z8oIoCT1cXz7gZP5GeU5OA 4
11 u0LXt3Uea_GidxRW1xcsfg XWTPNfskXoUL-Lf32wSk0Q 3
12 u0LXt3Uea_GidxRW1xcsfg 13nKUHH-uEUXVZylgxchPA 1
13 u0LXt3Uea_GidxRW1xcsfg RtUvSWO_UZ8V3Wpj0n077w 3
14 u0LXt3Uea_GidxRW1xcsfg Aov96CM4FZAXeZvKtsStdA 5
15 u0LXt3Uea_GidxRW1xcsfg 0W4lkclzZThpx3V65bVgig 4
16 u0LXt3Uea_GidxRW1xcsfg fdnNZMk1NP7ZhL-YMidMpw 1
17 u0LXt3Uea_GidxRW1xcsfg PFPUMF38-lraKzLcTiz5gQ 3
18 u0LXt3Uea_GidxRW1xcsfg oWTn2IzrprsRkPfULtjZtQ 3
19 u0LXt3Uea_GidxRW1xcsfg zgQHtqX0gqMw1nlBZl2VnQ 1
对于这20行,下面的代码很好用:
import pandas as pd
import numpy as np
proba_filepath = 'H:\\YelpData\\prob.csv'
df = pd.read_csv(proba_filepath, usecols=['user_id','business_id','stars'])
user_votes = df.pivot_table(index='user_id', columns='business_id', values='stars').fillna(0)
但是我的整个csv文件有500万行,当我尝试运行df.pivot_table时,出现了Followig错误:“不允许使用负数”
是否有解决此大量数据的解决方案?