Question

我编写了以下代码来抓取一些数据，如下所示：

import urllib.request, re

def get_content(page):
    url = 'https://www.liepin.com/zhaopin/?sfrom=click-pc_homepage-centre_searchbox-search_new&key=python&curPage=1'.format(page)
    a = urllib.request.urlopen(url)
    html = a.read()
    html = html.decode('utf-8')
    #print (html)
    return html

def get(html):
    reg = re.compile(r'class="job-info" >[^.]+<span class="job-name" title="(.*?)" >.*?',re.S)
    items = re.findall(reg, html)
    return items

for j in range(1,10):
    html = get_content(j)

    for i in get(html):
        print (i)
        with open("liepin.txt",'a')as f:
            f.write(i)`

但是，它不会打印任何内容。然后我怀疑这可能是由re引起的，所以我检查正则表达式，但-Regex Pal告诉我我的重新是正确的，它可以匹配html。

那么有人可以告诉我问题是什么以及如何解决它？

Answer 1

使用此解析器解决此问题https://www.crummy.com/software/BeautifulSoup/bs4/doc/

python代码不会打印任何内容

1 个答案: