python-如何将pivot_table用于大数据?

时间:2018-10-25 10:05:24

标签: python matrix large-data

我需要创建一个用户项目矩阵,其值是用户对企业的投票(1-5),如果用户未对企业投票,则为0。 我的csv文件的前20行如下:

                   user_id             business_id      stars
   0   bv2nCi5Qv5vroFiqKGopiw  AEx2SYEUJmTxVVB18LlCwA      5
   1   bv2nCi5Qv5vroFiqKGopiw  VR6GpWIda3SfvPC-lg9H3w      5
   2   bv2nCi5Qv5vroFiqKGopiw  CKC0-MOWMqoeWf6s-szl8g      5
   3   bv2nCi5Qv5vroFiqKGopiw  ACFtxLv8pGrrxMm6EgjreA      4
   4   bv2nCi5Qv5vroFiqKGopiw  s2I_Ni76bjJNK9yG60iD-Q      4
   5   _4iMDXbXZ1p1ONG297YEAQ  8QWPlVQ6D-OExqXoaD2Z1g      5
   6   u0LXt3Uea_GidxRW1xcsfg  9_CGhHMz8698M9-PkVf0CQ      4
   7   u0LXt3Uea_GidxRW1xcsfg  gkCorLgPyQLsptTHalL61g      4
   8   u0LXt3Uea_GidxRW1xcsfg  5r6-G9C4YLbC7Ziz57l3rQ      3
   9   u0LXt3Uea_GidxRW1xcsfg  fDF_o2JPU8BR1Gya--jRIA      5
   10  u0LXt3Uea_GidxRW1xcsfg  z8oIoCT1cXz7gZP5GeU5OA      4
   11  u0LXt3Uea_GidxRW1xcsfg  XWTPNfskXoUL-Lf32wSk0Q      3
   12  u0LXt3Uea_GidxRW1xcsfg  13nKUHH-uEUXVZylgxchPA      1
   13  u0LXt3Uea_GidxRW1xcsfg  RtUvSWO_UZ8V3Wpj0n077w      3
   14  u0LXt3Uea_GidxRW1xcsfg  Aov96CM4FZAXeZvKtsStdA      5
   15  u0LXt3Uea_GidxRW1xcsfg  0W4lkclzZThpx3V65bVgig      4
   16  u0LXt3Uea_GidxRW1xcsfg  fdnNZMk1NP7ZhL-YMidMpw      1
   17  u0LXt3Uea_GidxRW1xcsfg  PFPUMF38-lraKzLcTiz5gQ      3
   18  u0LXt3Uea_GidxRW1xcsfg  oWTn2IzrprsRkPfULtjZtQ      3
   19  u0LXt3Uea_GidxRW1xcsfg  zgQHtqX0gqMw1nlBZl2VnQ      1

对于这20行,下面的代码很好用:

import pandas as pd
import numpy as np

proba_filepath = 'H:\\YelpData\\prob.csv'

df = pd.read_csv(proba_filepath, usecols=['user_id','business_id','stars'])

user_votes = df.pivot_table(index='user_id', columns='business_id', values='stars').fillna(0)

但是我的整个csv文件有500万行,当我尝试运行df.pivot_table时,出现了Followig错误:“不允许使用负数”

是否有解决此大量数据的解决方案?

0 个答案:

没有答案