无法使用Pandas读取Movielens 1M数据集ratings.dat文件

时间:2013-02-25 06:00:54

标签: python pandas

爱好者 - 蟒蛇新手

您好,我正在使用Wes McKinney的Python for Data Analysis一书。我刚刚开始研究MovieLens 1M数据集,但就我而言,我无法让我的代码用于ratings.dat文件。它适用于movies.dat和users.dat文件,但我一直收到ratings.dat文件的错误。我从github和movielens.org下载了ratings.dat的副本,但是我得到了同样的错误。我已重命名该文件,但我仍然得到相同的错误。我转移到另一个目录,但我仍然得到同样的错误。我猜我有一些配置问题?


Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)]
Type "copyright", "credits" or "license" for more information.

IPython 0.13.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
%guiref   -> A brief reference about the graphical user interface.

Welcome to pylab, a matplotlib-based Python environment [backend: TkAgg].
For more information, type 'help(pylab)'.

import pandas as pd

rnames = ['user_id','movie_id','rating','timestamp']

ratings = pd.read_table('e:\ratings.dat',sep='',header=None,names=rnames)

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-1-5513dd9baafa> in <module>()
      3 rnames = ['user_id','movie_id','rating','timestamp']
      4 
----> 5 ratings = pd.read_table('e:\ratings.dat',sep='',header=None,names=rnames)
      6 

E:\Python27_new\lib\site-packages\pandas\io\parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, nrows, iterator, chunksize, verbose, encoding, squeeze)
    397                     buffer_lines=buffer_lines)
    398 
--> 399         return _read(filepath_or_buffer, kwds)
    400 
    401     parser_f.__name__ = name

E:\Python27_new\lib\site-packages\pandas\io\parsers.pyc in _read(filepath_or_buffer, kwds)
    206 
    207     # Create the parser.
--> 208     parser = TextFileReader(filepath_or_buffer, **kwds)
    209 
    210     if nrows is not None:

E:\Python27_new\lib\site-packages\pandas\io\parsers.pyc in __init__(self, f, engine, **kwds)
    505             self.options['has_index_names'] = kwds['has_index_names']
    506 
--> 507         self._make_engine(self.engine)
    508 
    509     def _get_options_with_defaults(self, engine):

E:\Python27_new\lib\site-packages\pandas\io\parsers.pyc in _make_engine(self, engine)
    607     def _make_engine(self, engine='c'):
    608         if engine == 'c':
--> 609             self._engine = CParserWrapper(self.f, **self.options)
    610         else:
    611             if engine == 'python':

E:\Python27_new\lib\site-packages\pandas\io\parsers.pyc in __init__(self, src, **kwds)
    888         # #2442
    889         kwds['allow_leading_cols'] = self.index_col is not False
--> 890         self._reader = _parser.TextReader(src, **kwds)
    891 
    892         # XXX

E:\Python27_new\lib\site-packages\pandas\_parser.pyd in pandas._parser.TextReader.__cinit__ (pandas\src\parser.c:2771)()

E:\Python27_new\lib\site-packages\pandas\_parser.pyd in pandas._parser.TextReader._setup_parser_source (pandas\src\parser.c:4810)()

atings.dat does not exist

错误的最后一行始终将文件名的第一部分截断。如前所述,相同的代码适用于movies.dat和users.dat。

2 个答案:

答案 0 :(得分:2)

尝试将转义添加到源路径e:\ratings.date:\\ratings.dat

答案 1 :(得分:1)

您应该将pathstring写为原始字符串(注意它之前的r):

ratings = pd.read_table(r'e:\ratings.dat', sep='', header=None, names=rnames)

这不起作用的原因是因为\r具有特殊含义(回车),它不是文件路径的一部分,这意味着python无法找到该文件。原始字符串会转义所有特殊字符 您可以在以下内容中看到:

In [1]: print ('\r')


In [2]: print (r'\r')
\r

等等,你可以像@pravin建议的那样“逃避”每个\个字符(使用\\)。