我需要在网页代码中搜索一个模式,该模式包含两个变量:一个已知变量,另一个正在尝试检索。
def getcpu():
parse()
for child in rt.iter('proc'):
proc = child.attrib['name']
cpumodel= proc.replace('(R)',"").replace('(TM)','').replace('CPU','')
return cpumodel
def passmark():
url = urlopen('https://www.cpubenchmark.net/cpu_list.php').read().decode('utf-8')
cpu = getcpu()
soup = BeautifulSoup(url, "html.parser")
score = soup.find(text=cpu)
print(score)
因此var1是已知的,必须用于搜索,并且应该以某种方式检索var2(代码当然不起作用)。我只是将var2放在那儿,因为我试图解释我想要实现的目标。 可能吗?还是除正则表达式外的其他方式?
编辑: 一个更好的例子。让我们在网页代码中保留一行:
<TR id="cpu793"><TD><A HREF="cpu_lookup.php?cpu=Intel+Core+i5-2400+%40+3.10GHz&id=793">Intel Core i5-2400 @ 3.10GHz</A></TD><TD>5965</TD><TD>662</TD><TD><a href="cpu.php?cpu=Intel+Core+i5-2400+%40+3.10GHz&id=793#price">41.15</a></TD><TD><ahref="cpu.php?cpu=Intel+Core+i5-2400+%40+3.10GHz&id=793#price">$144.99*</a></TD></TR>
Intel Core i5-2400 @ 3.10GHz是var1,基于此,我正在尝试获取var2(此行中的值为5965)
答案 0 :(得分:0)
根据评论中的建议,考虑使用BeautifulSoup:
html = '''<TR id="cpu793"><TD><A HREF="cpu_lookup.php?cpu=Intel+Core+i5-2400+%40+3.10GHz&id=793">Intel Core i5-2400 @ 3.10GHz</A></TD><TD>5965</TD><TD>662</TD><TD><a href="cpu.php?cpu=Intel+Core+i5-2400+%40+3.10GHz&id=793#price">41.15</a></TD><TD><ahref="cpu.php?cpu=Intel+Core+i5-2400+%40+3.10GHz&id=793#price">$144.99*</a></TD></TR>'''
var1 = 'Intel Core i5-2400 @ 3.10GHz'
import bs4
soup = bs4.BeautifulSoup(html)
result = soup.find(text=var1)
if result:
var2 = result.next.text
else:
print("Not found")