Python / C API无法导入urllib2库

时间:2017-02-22 19:19:28

标签: c macos python-2.7 beautifulsoup urllib2

我必须解析HTML url并返回url列表以使用递归的解析方法。我在Mac OS上使用BeautifulSoup,我有一个问题需要导入html_parser.py

html_parser.py:

#!/usr/local/bin/python2.7
from bs4 import BeautifulSoup
import urllib2


def link_list(urlString):
    siteFile = urllib2.urlopen(urlString)
    siteHTML = siteFile.read()
    siteFile.close()
    soup = BeautifulSoup(siteHTML, "html.parser")
    liste = []
    for links in soup.find_all('a'):
        print(links.get('href'))
        liste.append(links.get('href'))
    return liste

pars.c:

#include <stdio.h>
#include <Python.h>

int main() {
  Py_Initialize();

  /* 1st: Import the module */
    PyRun_SimpleString("from bs4 import BeautifulSoup\n");
    PySys_SetPath(".");
    PyObject* moduleString = PyString_FromString((char*) "html_parser");
    if (!moduleString) {
        PyErr_Print();
        printf("Error formating python script\n");
    }

    PyObject* module = PyImport_Import(moduleString);
    if (!module) {
        PyErr_Print();
        printf("Error importing python script\n");
    }

    /* 2nd: Getting reference to the function */
    PyObject* function = PyObject_GetAttrString(module, (char*)"link_list");
    if (!function) {
        PyErr_Print();
        printf("Pass valid argument to link_list()\n");
    }

    Py_Finalize();
    return 0;
}

我需要使用PySys_SetPath(".")将Python Path设置为我的工作目录。但是通过这样做它并不能识别bs4,所以在改变路径之前我使用PyRun_SimpleString("from bs4 import BeautifulSoup\n")但是当我试图为urllib2(PyRun_SimpleString("import urllib2\n"))做同样的事情时我得到了这个错误:

  Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 94, in <module>
    import httplib
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 80, in <module>
    import mimetools
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/mimetools.py", line 6, in <module>
    import tempfile
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tempfile.py", line 32, in <module>
    import io as _io
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/io.py", line 51, in <module>
    import _io
ImportError: dlopen(/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_io.so, 2): Symbol not found: __PyCodecInfo_GetIncrementalDecoder
  Referenced from: /usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_io.so
  Expected in: flat namespace
 in /usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_io.so

有人帮助了我,但在我的Python程序名称为parser.py之前,当我们不使用SetPath时,c使用另一个文件int默认的Python Path。所以Python.h不识别ether bs4和urllbi2。

编辑:

我刚用PyRun_SimpleString("print(sys.version)")检查了我的系统版本并得到了这个:

2.7.10 (default, Jul 30 2016, 19:40:32)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)]

所以我在Python 2.7.10而不是Python 3上,我不必使用url.request模块......

0 个答案:

没有答案