我经常使用urllib2库来解析python中的网页。 通常,URL的格式为:
page_url = 'http://www.website.com/webpage.html'
我用它来解析页面:
import urllib2
def read_page_contents(url):
try:
request = urllib2.Request(url)
handle = urllib2.urlopen(request)
content = handle.read()
except:
# aded as suggested by contributers below:
import traceback
traceback.print_exc()
content = None
return content
page = read_page_contents(page_url)
if page is not None:
# start dealing with page contents
pass
这没有问题,但是当我尝试一个没有html扩展名的URL时,如下所示, page_url ='https://energyplus.net/weather-region/north_and_central_america_wmo_region_4'
此方法失败来读取页面,它总是返回None!和错误消息
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden:
我搜索了Stackoverflow,但根据我的关键字,我发现没有任何用处!
请帮我解决这个问题。
提前致谢
----------
我找到了答案,感谢以下2位贡献者的帮助:
导入请求
def read_page_contents(url):
try:
request = requests.get(url)
content = request.content
except:
# aded as suggested by contributers below:
import traceback
traceback.print_exc()
content = None
return content
答案 0 :(得分:2)
这与您的网址中没有.html这一事实无关。你的代码本身相当混乱。一个位置有page_url,另一个位置有continent_url。所以你将无法执行此代码。我假设这是一个复制粘贴问题。您的代码中的真正错误是
except:
content = None
永远不要这样做。如果你有一个普通捕获所有异常,你绝对必须记录
except:
import traceback
traceback.print_exc()
content = None
您将看到您尝试检索的页面存在真正的问题(原来是一个权限问题)。
答案 1 :(得分:2)
使用requests并节省时间做更有意义的事情。
r.status_code: 200
出:
import java.util.ArrayList;
public class NoOfPaths {
static int xRows = 4;
static int yColumns = 4;
static int noOfPaths = 0;
/*A robot is located in the upper-left corner of a 4×4 grid.
* The robot can move either up, down, left, or right,
* but cannot go to the same location twice.
* The robot is trying to reach the lower-right corner of the grid.
* Your task is to find out the number of unique ways to reach the destination.
**/
static ArrayList validNeighbours (int x,int y, ArrayList visited) {
ArrayList valid = new ArrayList();
if((x+1 <= xRows) && !visited.contains(((x+1)*10)+y) ) {
valid.add(((x+1)*10)+y);
}
if((x-1 >= 0) && !visited.contains(((x-1)*10)+y) ) {
valid.add(((x-1)*10)+y);
}
if((y+1 <= yColumns) && !visited.contains(x*10+y+1) ) {
valid.add(x*10+y+1);
}
if((y-1 >= 0) && !visited.contains(x*10+y-1) ) {
valid.add(x*10+y-1);
}
return valid;
}
static void pathify(int x,int y, ArrayList alreadyVisited) {
if(x == xRows && y == yColumns) {
noOfPaths++;
} else {
alreadyVisited.add(x*10+y);
ArrayList callAgain = new ArrayList();
callAgain = validNeighbours(x,y,alreadyVisited);
for (int t=0,temp; t<callAgain.size(); t++) {
temp=(int) callAgain.get(t);
pathify(temp/10, temp%10, alreadyVisited);
}
}
}
public static void main(String[] args) {
ArrayList alreadyVisited = new ArrayList();
pathify(0, 0, alreadyVisited);
System.out.println(noOfPaths);
}
}