为什么requests.post对Clustal Omega服务没有回复?

时间:2017-03-07 20:36:45

标签: python bioinformatics

import requests
MSA_request=""">G1
MGCTLSAEDKAAVERSKMIDRNLREDGEKAAREVKLLLL
>G2
MGCTVSAEDKAAAERSKMIDKNLREDGEKAAREVKLLLL
>G3
MGCTLSAEERAALERSKAIEKNLKEDGISAAKDVKLLLL"""
q={"stype":"protein","sequence":MSA_request,"outfmt":"clustal"}
r=requests.post("http://www.ebi.ac.uk/Tools/msa/clustalo/",data=q)

这是我的脚本,我将此请求发送到网站,但结果看起来我什么都没做,网络服务没有收到我的请求。这个方法以前和其他网站一样好,也许这个页面有一个弹出窗口要求cookie协议?

1 个答案:

答案 0 :(得分:1)

您所指的网页上的表单有一个单独的网址,即

http://www.ebi.ac.uk/Tools/services/web_clustalo/toolform.ebi

您可以在浏览器中使用DOM检查器验证这一点。 因此,为了继续requests,您需要访问正确的页面

r=requests.post("http://www.ebi.ac.uk/Tools/services/web_clustalo/toolform.ebi",data=q)

这将使用您的输入数据提交作业,它不会直接返回结果。要检查结果,必须从先前的响应中提取作业ID,然后生成另一个请求(没有数据)

http://www.ebi.ac.uk/Tools/services/web_clustalo/toolresult.ebi?jobId=...

但是,您一定要检查此程序访问是否与该网站的服务条款兼容......

以下是一个例子:

from lxml import html
import requests
import sys
import time

MSA_request=""">G1
MGCTLSAEDKAAVERSKMIDRNLREDGEKAAREVKLLLL
>G2
MGCTVSAEDKAAAERSKMIDKNLREDGEKAAREVKLLLL
>G3
MGCTLSAEERAALERSKAIEKNLKEDGISAAKDVKLLLL"""
q={"stype":"protein","sequence":MSA_request,"outfmt":"clustal"}

r = requests.post("http://www.ebi.ac.uk/Tools/services/web_clustalo/toolform.ebi",data = q)
tree = html.fromstring(r.text)
title = tree.xpath('//title/text()')[0]

#check the status and get the job id
status, job_id = map(lambda s: s.strip(), title.split(':', 1))
if status != "Job running":
    sys.exit(1)

#it might take some time for the job to finish
time.sleep(10)

#download the results
r = requests.get("http://www.ebi.ac.uk/Tools/services/web_clustalo/toolresult.ebi?jobId=%s" % (job_id))

#prints the full response
#print(r.text)

#isolate the alignment block
tree = html.fromstring(r.text)
alignment = tree.xpath('//pre[@id="alignmentContent"]/text()')[0]
print(alignment)