我的工作脚本的最新版本已包含在帖子的底部。我正在研究如何维基。 **
美好的一天,我有以下代码,我想知道如何搜索结果匹配?我将尝试匹配两到三个单词。我尝试过html2text,beautifulsoup,re.search和其他几个。我没有实现我正确尝试的东西,或者他们只是不工作。
import requests
s = requests.session()
url = 'http://company.name.com/donor/index.php'
values = {'username': '1234567',
'password': '7654321'}
r = s.post(url, data=values)
# page which requires being logged in to view
url = "http://company.name.com/donor/donor.php"
# sending cookies as well
result = s.get(url)
我尝试了很多不同的方法,但是无法得到它。我想知道我需要使用哪个模块?我是否需要更改“结果”所在的数据形式?我没有尝试的一件事是将“结果”写入文本文件。我想我可以做到这一点,然后在那个文件中搜索我的比赛......我只是觉得有一种非常简单的方法可以做到这一点。
感谢任何帮助或指示
更新/编辑的脚本:
## Script will, login, navigate to correct page, search and match, then print and text/sms result.
import re
import urllib
import smtplib
import requests
from bs4 import BeautifulSoup
s = requests.session()
url = 'http://company.name.com/donor/index.php'
values = {'username': '123456',
'password': '654321'}
r = s.post(url, data=values)
# Now you have logged in
url = "http://company.name.com/donor/donor.php"
# sending cookies as well
result = s.get(url)
print (result.headers)
print (result.text)
result2 = (result.text)
match1 = re.findall('FindMe', result2); #we are trying to find "FindMe" in "result2"
if len(match1) == 1: #if we find a match
matchresult = ('Yes it matched')
print (matchresult)
else: #if we don't find a match
matchresult = ('Houston we have a problem')
print (matchresult)
# send text from gmail account portion of code starts here.
body = matchresult
body = "" + body + ""
headers = ["From: " + 'Senders Name',
"Subject: " + 'Type Subject Information',
"To: " + '1234567890@mms.att.net', #phone number and cell carrier @address
"MIME-Version: 1.0",
"Content-Type: text/html"]
headers = "\r\n".join(headers)
session = smtplib.SMTP('smtp.gmail.com', '587')
session.ehlo()
session.starttls()
session.ehlo
session.login('anemailaddress@gmail.com', 'passwordforemailaddress')
session.sendmail('senders name', '1234567890@mms.att.net', headers + "\r\n\r\n" + body)
session.quit()
答案 0 :(得分:1)
仍然不确定我是否正确理解了这个问题,但根据您评论中的其他信息,这样做应该足够了:
import urllib2
page = urllib2.urlopen("http://your.url.com")
content = page.read()
if "congratulations" in content:
print ...
if "We're sorry" in content:
print ...
当您在寻找非常具体的单词时,不需要使用正则表达式来匹配某些更通用的模式,或者使用HTML解析器来查看文档的结构。只需查看字符串是in
文档。