Question

这是我的第一个使用python进行网络的程序。我想在谷歌和FIFA主页上计算特定（e.i：足球）的出现次数。

1）在Google上

def wordOnTheWebGoogle():
    import urllib2
    import re    
    page = urllib2.urlopen("http://www.google.com").read()
    print re.findall("football",page)
    print page.find("football")

输出

[]

-1

2）在国际足联主页上

def wordOnTheWebFifa():
    import urllib2
    import re    
    page = urllib2.urlopen("http://www.fifa.com").read()
    print re.findall("football",page)
    print page.find("football")

输出

wordOnTheWebFifa()
Traceback (most recent call last):

  File "<ipython-input-51-4e40573ed4fb>", line 1, in <module>
    wordOnTheWebFifa()

  File "D:L12Problem.py", line 21, in wordOnTheWebFifa
    page = urllib2.urlopen("http://www.fifa.com").read()

  File "C:\Anaconda\lib\urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)

  File "C:\Anaconda\lib\urllib2.py", line 410, in open
    response = meth(req, response)

  File "C:\Anaconda\lib\urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)

  File "C:\Anaconda\lib\urllib2.py", line 448, in error
    return self._call_chain(*args)

  File "C:\Anaconda\lib\urllib2.py", line 382, in _call_chain
    result = func(*args)

  File "C:\Anaconda\lib\urllib2.py", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

HTTPError: Forbidden

我认为至少谷歌搜索会返回一些东西，但没有。有人可以帮我解决这两个问题吗？对于Fifa.com，为什么这个禁止的消息。

此致 Adjeiinfo

Answer 1

问题1：

你没有找到＆＃34;足球＆＃34; at＆＃34; www.google.com＆＃34;因为＆＃34;足球＆＃34;没有出现在该页面上。在浏览器中加载www.google.com，看看是否可以看到＆＃34; football＆＃34;。

如果您想在搜索＆＃34; football＆＃34;时搜索google返回的页面，您可以模拟按下＆＃34; google搜索＆＃34;该页面上的按钮。正如您将看到的那样，如果您查看google.com的来源，在该巨大的代码块中查找表单字段并确定如何提交它们并非易事。并且如评论中所述，它可能违反使用条款。

问题2：

为什么urllib2无法加载www.fifa.com，这很神秘。我看不出你做错了什么 - 它对我来说也是如此。关于我唯一可以想到的是urllib2没有提供fifa.com服务器要求的头信息，所以请求被拒绝（因为＆＃34; Forbidden＆＃34;错误似乎在告诉我们这是拒绝我们联系的fifa.com）。

我建议您使用更好的库来进行Web访问。这样做你想要的：

import requests
import re
def wordOnTheWebFifa():
    page = requests.get("http://www.fifa.com").text
    print re.findall("football", page)
    print page.find("football")

wordOnTheWebFifa()

结果：

mgregory$ python foo.py
[u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football', u'football']
2569

Python：在给定网页上搜索单词

1 个答案: