我尝试做类似的事情:
from BeautifulSoup import BeautifulSoup
import urllib2,unicodedata
import re
for x in range(1,105):
html_page = urllib2.urlopen('http://xxxxxx/BUSCAR/H=1;OR=5;ST=;LIST_ART_PAGENUMBER='+str(x)+';/Dxxxxx.aspx')
soup = BeautifulSoup(html_page)
for link in soup.findAll('a', attrs={'href': re.compile("^http://xxxxxx/PRODUCTO/PROD_ID")}):
print link.get('href')
提取链接。我正确提取链接。但我想提取1到105的范围
但这不起作用!
error: expected an indented block
答案 0 :(得分:1)
启动for循环时需要缩进。试试这个:
from BeautifulSoup import BeautifulSoup
import urllib2,unicodedata
import re
for x in range(1,105):
html_page = urllib2.urlopen('http://xxxxxx/BUSCAR/H=1;OR=5;ST=;LIST_ART_PAGENUMBER='+str(x)+';/Dxxxxx.aspx')
soup = BeautifulSoup(html_page)
for link in soup.findAll('a', attrs={'href':re.compile("^http://xxxxxx/PRODUCTO/PROD_ID")}):
print link.get('href')