有一个我不明白的问题。我在目录中有两个自定义模块。 请注意,以下代码是自愿缩短的。我给出了不起作用的功能,另一个给出了类似但有效的功能。 让我们看看:
module_parsing.py
import re
def hippodrome_numreu_prix(response):
"""Parse the hippodrome's name, number of the meeting and prize's name.
response -- html page. Arrival table.
returns a tuple as (hippdrome's name, nb of meeting, prize's name)
"""
#hippodrome means racecourse, but this word exists in English
#There are several races on a hippodrome for a particular day, that's called a meeting("reunion" in French). On a particular day there are several meetings too. So that's why we call it number of meeting: n°1, n°2...
#A race is for a prize ("prix" in French). This prize has a name.
hip_num = response.xpath("//html//h1[@class='CourseHeader-title']/text()").extract()
hip_num = ''.join(hip_num)
#HIPPODROME
if re.search('-\s{0,5}([A-zÀ-ÿ|-|\s]+)\s{0,5}/', hip_num):
hippo = re.search('-\s{0,5}([A-zÀ-ÿ|-|\s]+)\s{0,5}/', hip_num).group(1).lower().replace(' ','')
else:
hippo = None
#NUMBER OF MEETING
if re.search('R[0-9]+', hip_num):
num_reunion = re.search('R[0-9]+', hip_num).group()
else:
num_reunion = 'PMH'
#PRIZE
prix = response.xpath("//html//h1[@class='CourseHeader-title']/strong/text()").extract_first().lower()
return (hippo,num_reunion,prix)
def allocation_devise(response):
"""Parse the amount of allowance and currency (€, £, $, etc)
response -- html page. Arrival table.
returns a tuple as (allowance's amount, currency's symbol)
"""
#"allocation" means allowance. That's sum of the prizes for winners: 1st, 2nd, 3rd, etc.
#"devise" means currency. Depending of the country of the hippodrome, the allowance are expressed in different currencies.
alloc_devise = response.xpath("//html//div[@class='row-fluid row-no-margin text-left']/p[2]/text()[2]").extract_first()
#ALLOWANCE
if re.search(r'[0-9]+',alloc_devise.replace('.','')):
alloc = int(re.search(r'[0-9]+',alloc_devise.replace('.','')).group())
else:
alloc = None
#CURRENCY
if re.search(r'([A-Z|a-z|£|$|€]+)',alloc_devise):
devise = re.search(r'([A-Z|a-z|£|$|€]+)',alloc_devise).group()
else:
devise = None
return (alloc, devise)
module_correction.py
与上一个有依赖关系:
from module_parsing import *
def fonction_parse_correction(
champ_corrige, response):
dico_fonction_parse = {
'allocation': allocation_devise,
'devise': allocation_devise,
'hippodrome':hippodrome_numreu_prix,
'reunion' : hippodrome_numreu_prix,
'prix': hippodrome_numreu_prix,
}
if champ_corrige in {'allocation','hippodrome'}:
return dico_fonction_parse[champ_corrige](response)[0]
elif champ_corrige in {'devise','reunion'}:
return dico_fonction_parse[champ_corrige](response)[1]
elif champ_corrige in {'prix'}:
return dico_fonction_parse[champ_corrige](response)[2]
现在,当我在草皮外壳中测试我的功能时:
scrapy shell https://www.paris-turf.com/programme-courses/2019-07-09/reunion-chateaubriant/resultats-rapports/prix-synergie-1150124
#Here I change the path with sys.path.insert() and only import module_correction
In [1]: from module_correction import *
In [2]: fonction_parse_correction('hippodrome',response)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-34-e82fe121aab0> in <module>
----> 1 fonction_parse_correction('hippodrome',response)
~/.../scrapy_project/scrapy_project/module_correction.py in fonction_parse_correction(champ_corrige, response)
103 if champ_corrige in {'allocation','hippodrome'}:
--> 104 return dico_fonction_parse[champ_corrige](response)[0]
105 elif champ_corrige in {'devise','reunion'}:
~/.../scrapy_project/scrapy_project/module_parsing.py in hippodrome_numreu_prix(response)
157 #HIPPODROME
--> 158 if re.search('-\s{0,5}([A-zÀ-ÿ|-|\s]+)\s{0,5}/', hip_num):
160 hippo = re.search('-\s{0,5}([A-zÀ-ÿ|-|\s]+)\s{0,5}/', hip_num).group(1).lower().replace(' ','')
161 else:
AttributeError: 'NoneType' object has no attribute 'group'
当我执行allocation_devise()
到fonction_parse_correction()
时,这变得很奇怪,因为它可以工作:
In [3]: fonction_parse_correction('allocation',response)
Out[3]: 35000
当我简单地将hippodrome_numreu_prix()
函数复制并粘贴到shell中以按以下方式单独执行该函数时,就更奇怪了:
In [4]: hippodrome_numreu_prix(response)
Out[4]: ('châteaubriant', 'R1', 'prix synergie')
因此,在这里,我清楚地看到None
类型没有问题,因为search()
清楚地找到了它。此外,似乎dict
使用它并不是问题,因为相似的功能allocation_devise()
可以很好地工作,甚至可以在相同的参数response
上工作。
我看不到有什么问题?
注意:Ubuntu 18.04,Scrapy 1.5.2,Python 3.7.1