import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
u_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
users = pd.read_csv('ml-100k/u.user', sep='|', names=u_cols, encoding='latin-1')
r_cols = ['user_id','movie_id','rating', 'unix_timestamp']
ratings = pd.read_csv('ml-100k/u.data', sep="\t", names=r_cols, encoding='latin-1')
答案 0 :(得分:0)
csv文件本身的列名称中可能有重复项。
答案 1 :(得分:0)
我必须要处理movielens数据集,但是使用您的代码加载它不会出现任何错误:
u_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
users = pd.read_csv('ml-100k/u.user', sep='|', names=u_cols, encoding='latin-1')
r_cols = ['user_id','movie_id','rating', 'unix_timestamp']
ratings = pd.read_csv('ml-100k/u.data', sep="\t", names=r_cols, encoding='latin-1')
users.head()
Out[36]:
user_id age sex occupation zip_code
0 1 24 M technician 85711
1 2 53 F other 94043
2 3 23 M writer 32067
3 4 24 M technician 43537
4 5 33 F other 15213
ratings.head()
Out[37]:
user_id movie_id rating unix_timestamp
0 196 242 3 881250949
1 186 302 3 891717742
2 22 377 1 878887116
3 244 51 2 880606923
4 166 346 1 886397596
答案 2 :(得分:0)
尝试使用此版本的熊猫
pip install pandas==0.20.0