我试图提取以特定单词开头的提取链接(href),但即使页面源中有很多满足条件的链接,它也返回空列表,但我肯定缺少某些内容,以下是我的代码:
01/11/2019 06:00 PM USO-FOX-USO E10 8.9929 0.0000
01/11/2019 06:00 PM USO-FOX-USO CON8HE10 1.3212 -0.0244
01/11/2019 06:00 PM USO-FOX-USO CON8HE10TT 1.3232 -0.0244
答案 0 :(得分:0)
尝试一下:
import requests
from bs4 import BeautifulSoup
import string
import os
import re
def extract_href_page(page):
soup = BeautifulSoup(page)
all_links = []
links = soup.find_all('a', href=True)
# pattern = re.compile(r'\w*recette')
print(links)
for link in links:
if re.match(r"\w*first_word", link["href"], re.I):
all_links.append(link.get("href"))
...