我将网页的源代码定义为字符串类型变量。我知道源代码会有一个特定的日期。我想打印出该日期之前出现的第一个链接。这个链接可以在撇号(""
)之间找到,这里是代码:
import requests
from datetime import date
import re
link = "https://www.google.com.mx/search?biw=1535&bih=799&tbm=nws&q=%22New+Strong+Buy%22+site%3A+zacks.com&oq=%22New+Strong+Buy%22+site%3A+zacks.com&gs_l=serp.3...1632004.1638057.0.1638325.24.24.0.0.0.0.257.2605.0j15j2.17.0....0...1c.1.64.serp..8.0.0.Nl4BZQWwR3o"
fetch_data =requests.get(link)
content = str((fetch_data.content))
#this is the source code as a string
Months = ["January","February","March","April","May","June","July","August","September","October","November","December"]
today = date.today()
A= ("%s %s" % (Months[today.month - 1],today.day))
a=today.day
B= A in content
if B == True:
B = ("%s %s" % (Months[today.month - 1], a))
else:
while B == False:
a = a - 1
B = ("%s %s" % (Months[today.month - 1], a))
#the B variable is the string date that will appear in the variable string content
c= ('"https:')
Z= ("%s(.*)%s" % (c,B))
result = re.search(Z, content)
print (result)
这就是我尝试的:我在变量c
和B
之间寻找子字符串,代码没有找到任何东西
如果有人从the link查找源代码,您会发现今天的日期" 12月27日"只出现一次,在此之前,我感兴趣的链接显示为" https://www.zacks.com/commentary/98986/new-strong-buy-stocks-for-december-27th"。
有人可以帮我自动化python来定义这个链接并打印出来吗?
答案 0 :(得分:0)
正如Barmar所说,你最好使用像BeautifulSoup这样的DOM解析器。这是一个例子
from BeautifulSoup import BeautifulSoup
import requests, urlparse
from datetime import datetime
link = "https://www.google.com.mx/search?biw=1535&bih=799&tbm=nws&q=%22New+Strong+Buy%22+site%3A+zacks.com&oq=%22New+Strong+Buy%22+site%3A+zacks.com&gs_l=serp.3...1632004.1638057.0.1638325.24.24.0.0.0.0.257.2605.0j15j2.17.0....0...1c.1.64.serp..8.0.0.Nl4BZQWwR3o"
r = requests.get(link)
soup = BeautifulSoup(r.text)
search = datetime.today().strftime("%B %d")
print("Searching for {}".format(search))
result = None
for i in soup.findAll('h3'):
linkText = i.getText()
if search in linkText:
result = i.find('a').get('href')
result = result.split('?')[-1]
result = urlparse.parse_qs(result)['q'][0]
break
print(result)
我收到的输出是
Searching for December 27
https://www.zacks.com/commentary/98986/new-strong-buy-stocks-for-december-27th