我创建了一个数组,其中包含所有可能的html标记,该标记在运行时将导航至目标字符串或不显示任何内容。
group = ['div','span','a','link','dl','dt','dd','b','p','meta','']
comb = []
for g1 in group:
if g1 != '':
for g2 in group:
if g2 != '':
for g3 in group:
if g3 != '':
res = "tag."+g1+"."+g2+"."+g3+".string"
comb.append(res)
else:
res = "tag."+g1+"."+g2+".string"
comb.append(res)
else:
res = "tag."+g1+".string"
comb.append(res)
我想运行数组中的每个条目,以查看其从给定网站返回的内容。
def get_web_price(url):
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "lxml")
tag = soup.find(class_=re.compile("price"))
for c in comb:
exec(c, globals())
是否可以像exec()
一样在列表中运行字符串?
我在Python 3上使用BeautifulSoup,Requests,Googlesearch和Re
答案 0 :(得分:1)
您不需要exec()
或eval()
来进行动态属性访问,请使用getattr()
,或者在BeautifulSoup
的情况下,使用方法find()
获取第一个符合指定条件的孩子:
from itertools import chain, product
group = ['div','span','a','link','dl','dt','dd','b','p','meta']
# Produce a list of tuples of element names
comb = list(chain(*[product(*[group] * n) for n in range(1, 4)]))
def get_web_price(url):
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "lxml")
tag = soup.find(class_=re.compile("price"))
for c in comb:
t = tag
for a in c:
t = t.find(a)
if not t:
break
if not t:
continue
# Do something with t.string
t.string
我认为您也可以使用select()
来达到相同效果的限制:
def get_web_price(url):
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "lxml")
tag = soup.find(class_=re.compile("price"))
for c in comb:
selector = ' '.join(c)
r = tag.select(selector, limit=1)
if r:
r = r[0]
else:
continue
r.string
关于抓取Google搜索结果是否是一个好主意,我没有立场。