所以这是我正在运行的脚本,它在windows中输出正常,但是在ubuntu中, 它只是打印一个空列表
import urllib2
import os
import re
import csv
from bs4 import BeautifulSoup
useragent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1309.0 Safari/537.17'
def main():
# lib-talkingpointsmemo.py
archive = 'http://talkingpointsmemo.com/archive.php'
getweeklinks(archive)
def getweeklinks(archivelink):
print 'something'
urls = []
request = urllib2.Request(archivelink, headers={'User-agent': useragent})
webpage = urllib2.urlopen(request).read()
soup = BeautifulSoup(webpage)
anchors = soup('a')
print anchors
for a in anchors:
print a['href']
if __name__ == '__main__' : main()
和输出:
something
[]
什么错了?我正在使用Ubuntu 12.04.1 LTS
答案 0 :(得分:3)
soup = BeautifulSoup(webpage,"html.parser")
...确保在windows和ubuntu测试之间使用相同的解析器。您可能还想尝试其他一些parser options