Web抓取(使用jupyter时出错)

时间:2016-11-05 18:43:45

标签: python jupyter-notebook

这是我第一次使用python和所有相关的包和工具。所以我试着按照this lecture中给出的示例 这是代码

    import pandas as pd

# pass in column names for each CSV
u_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']

users = pd.read_csv(
    'http://files.grouplens.org/datasets/movielens/ml-100k/u.user', 
    sep='|', names=u_cols)

users.head()

使用jupyter

执行代码时,我只得到错误
URLErrorTraceback (most recent call last)
<ipython-input-4-cd2489d7386f> in <module>()
      6 users = pd.read_csv(
      7     'http://files.grouplens.org/datasets/movielens/ml-100k/u.user',
----> 8     sep='|', names=u_cols)
      9 
     10 users.head()

/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    560                     skip_blank_lines=skip_blank_lines)
    561 
--> 562         return _read(filepath_or_buffer, kwds)
    563 
    564     parser_f.__name__ = name

/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
    299     filepath_or_buffer, _, compression = get_filepath_or_buffer(
    300         filepath_or_buffer, encoding,
--> 301         compression=kwds.get('compression', None))
    302     kwds['compression'] = (inferred_compression if compression == 'infer'
    303                            else compression)

/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/io/common.pyc in get_filepath_or_buffer(filepath_or_buffer, encoding, compression)
    306 
    307     if _is_url(filepath_or_buffer):
--> 308         req = _urlopen(str(filepath_or_buffer))
    309         if compression == 'infer':
    310             content_encoding = req.headers.get('Content-Encoding', None)

/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    152     else:
    153         opener = _opener
--> 154     return opener.open(url, data, timeout)
    155 
    156 def install_opener(opener):

/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in open(self, fullurl, data, timeout)
    427             req = meth(req)
    428 
--> 429         response = self._open(req, data)
    430 
    431         # post-process response

/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in _open(self, req, data)
    445         protocol = req.get_type()
    446         result = self._call_chain(self.handle_open, protocol, protocol +
--> 447                                   '_open', req)
    448         if result:
    449             return result

/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in _call_chain(self, chain, kind, meth_name, *args)
    405             func = getattr(handler, meth_name)
    406 
--> 407             result = func(*args)
    408             if result is not None:
    409                 return result

/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in http_open(self, req)
   1226 
   1227     def http_open(self, req):
-> 1228         return self.do_open(httplib.HTTPConnection, req)
   1229 
   1230     http_request = AbstractHTTPHandler.do_request_

/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in do_open(self, http_class, req, **http_conn_args)
   1196         except socket.error, err: # XXX what error?
   1197             h.close()
-> 1198             raise URLError(err)
   1199         else:
   1200             try:

URLError: <urlopen error [Errno -2] Name or service not known>

根据讲座,结果应为like this

1 个答案:

答案 0 :(得分:0)

看起来像网络问题(请检查互联网连接)。代码对我来说很好:

>>> users.head()
   user_id  age sex  occupation zip_code
0        1   24   M  technician    85711
1        2   53   F       other    94043
2        3   23   M      writer    32067
3        4   24   M  technician    43537
4        5   33   F       other    15213

尝试在浏览器中打开网址,检查是否可以从您的计算机加载(http://files.grouplens.org/datasets/movielens/ml-100k/u.user)。