我正在使用找到的开放数据集。具体来说,我正在使用此数据集:http://files.grouplens.org/datasets/movielens/ml-100k/u.item。我正在尝试解析数据集,当我将其加载到pandas中时:
movie_cols = ['movie_id', 'title','release_date','imdb_url']
movies = pd.read_csv('http://files.grouplens.org/datasets/movielens/ml-100k/u.item',sep='|',names=movie_cols)
当我尝试运行时
movies.head()
它显示了这个:
答案 0 :(得分:1)
功能read_csv
中的过滤器usecols
列需要参数1., 2., 3. and 5.
:
movie_cols = ['movie_id', 'title', 'release_date', 'imdb_url']
movies = pd.read_csv('http://files.grouplens.org/datasets/movielens/ml-100k/u.item',
sep='|',
names=movie_cols,
encoding='latin-1',
usecols = [0,1,2,4])
print (movies.head())
movie_id title release_date \
0 1 Toy Story (1995) 01-Jan-1995
1 2 GoldenEye (1995) 01-Jan-1995
2 3 Four Rooms (1995) 01-Jan-1995
3 4 Get Shorty (1995) 01-Jan-1995
4 5 Copycat (1995) 01-Jan-1995
imdb_url
0 http://us.imdb.com/M/title-exact?Toy%20Story%2...
1 http://us.imdb.com/M/title-exact?GoldenEye%20(...
2 http://us.imdb.com/M/title-exact?Four%20Rooms%...
3 http://us.imdb.com/M/title-exact?Get%20Shorty%...
4 http://us.imdb.com/M/title-exact?Copycat%20(1995)