尝试发布到使用框架并使用python检索数据的表单

时间:2014-06-29 20:20:44

标签: python post python-requests multipartform-data frames

感谢Northcat和其他人,我能够使用请求向http://www.camp.bicnirrh.res.in/featcalc/发布多部分/表单数据请求 - 它就像一个魅力。我现在正尝试将数据发布到http://pro-161-70.ib.unicamp.br/~itaraju/tools/pimw/并仅选择“显示pI / MW值”选项。我正在上传一个名为Denovo的文件。这是我到目前为止所尝试的,试图遵循我之前发布的问题所采用的相同格式。

import requests
import urllib
session = requests.Session()
file={'file': (open('Bishop/Denovo.txt', 'r').read())}
url = 'http://pro-161-70.ib.unicamp.br/~itaraju/tools/pimw/pimw.htm'
payload = {"opShowpimw":"opShowpimw", "opUseTabs":"opUseTabs"}
raw = urllib.urlencode(payload)
response = session.post(url, files=file, data=payload)
print response.text

我在代码中使用的是url而不是顶部列出的url,因为网站使用框架并返回“此页面使用框架,但您的浏览器不支持它们”。所以我通过查看'view frame source'找到了上面的url。有效载荷来自于ieheaders。有效载荷中的第一个对应于“显示pI / MW值”,第二个是黑暗中的镜头,试图通过使其作为文本出现而更容易(在表单上,​​单击“.txt”格式)。响应中没有值,看起来像第一页。结果页面上框架源的网址为“http://pro-161-70.ib.unicamp.br/~itaraju/cgi-bin/itaraju/bioinf/pimw.cgi”,但使用此网址时不会产生任何响应。

1 个答案:

答案 0 :(得分:1)

我将序列作为文本发送到tbSeq

我在http://pro-161-70.ib.unicamp.br/~itaraju/tools/pimw/what.htm

上找到了这个序列

它为我提供了一些结果和图像(如下所示)保存的od磁盘为'output.gif'

import requests
import lxml.html

url = 'http://pro-161-70.ib.unicamp.br/~itaraju/cgi-bin/itaraju/bioinf/pimw.cgi'
payload = {
    'arquivo': '',
    'opShowTitle': 'ON',
    'opShowSeq': 'ON',
    'opShowStat': 'ON',
    'opShowpimw': 'ON',
    'opGelVirtual': 'ON',
    'opMap': 'gel0.def',
    'opPK': 'Default',
    'tbCt': 3.55,
    'tbNt': 7,
    'tbArg': 12.01,
    'tbAsp': 4.06,
    'tbCys': 9,
    'tbGlu': 4.45,
    'tbHis': 5.985,
    'tbLys': 10.01,
    'tbTyr': 10.01,
    'tbSeq': '''>gi|532319|pir|TVFV2E|TVFV2E envelope protein
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVH
CTNLMNTTVTTGLLLNGSYSENRTQIWQKHRTSNDS
ALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQ
KYNLRLRQAWCHFPSNWKGAWKEVKEEIVNLPKER
YRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPG
PCVQRTYVACHIRSVIIWLETISKKTYAPPREGHLECT
STVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRY
KLVEITPIGFAPTEVRRYTGGHERQKRVPFVXXXXXX
XXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL
LAAVEAQQQMLKLTIWGVK''',
}

# send POST    
r = requests.post(url, data=payload)

#print r.text

# convert HTML string into HTML tree
html = lxml.html.fromstring(r.text)

# get all images
imgs = html.cssselect('img')

# get second image
if len(imgs) > 1:
    url = 'http://pro-161-70.ib.unicamp.br/~itaraju/cgi-bin/itaraju/bioinf/' + imgs[1].attrib['src'].strip()

    print "Downloading ...",  url

    with open('output.gif', 'wb') as handle:
        r = requests.get(url, stream=True)

        if not r.ok:
            # Something went wrong
            pass

        for block in r.iter_content(1024):
            if not block:
                break

            handle.write(block)
            print '.',

        print 

# get data
for tr in html.cssselect('tr'):
    for td in tr.cssselect('tr'):
        print td.text_content().strip().replace('\n', ' | '),
    print 

结果:

Downloading ... http://pro-161-70.ib.unicamp.br/~itaraju/cgi-bin/itaraju/bioinf/../../../tools/htdocs/tmp/gel.15548.gif
. . . . . . . . . . . . . . . . . . . . . . . . . .


ORF:
gi|532319|pir|TVFV2E|TVFV2E envelope protein
Sequence:
ELRLRYCAPAGFALLKCNDADYDGFKTNCS NVSVVHCTNLMNTTVTTGLLLNGSYSENRT QIWQKHRTSNDSALILLNKHYNLTVTCKRP GNKTVLPVTIMAGLVFHSQKYNLRLRQAWC HFPSNWKGAWKEVKEEIVNLPKERYRGTND PKRIFFQRQWGDPETANLWFNCHGEFFYCK MDWFLNYLNNLTVDADHNECKNTSGTKSGN KRAPGPCVQRTYVACHIRSVIIWLETISKK TYAPPREGHLECTSTVTGMTVELNYIPKNR TNVTLSPQIESIWAAELDRYKLVEITPIGF APTEVRRYTGGHERQKRVPFVXXXXXXXXX XXXXXXXXXXXXXVQSQHLLAGILQQQKNL LAAVEAQQQMLKLTIWGVK
MW: |       pI:
40969.02 |  |   9.35
Amino-acid composition
Ala (A) | 20 | 5.3% |  | Cys (C) | 12 | 3.2% |  | Asp (D) | 10 | 2.6% |  | Glu (E) | 19 | 5.0% |  | Phe (F) | 12 | 3.2% |  | Gly (G) | 20 | 5.3% |  | His (H) | 11 | 2.9% |  | Ile (I) | 16 | 4.2% |  | Lys (K) | 24 | 6.3% |  | Leu (L) | 34 | 9.0% |  |    |  |  | Met (M) | 5 | 1.3% |  | Asn (N) | 27 | 7.1% |  | Pro (P) | 16 | 4.2% |  | Gln (Q) | 17 | 4.5% |  | Arg (R) | 21 | 5.5% |  | Ser (S) | 16 | 4.2% |  | Thr (T) | 30 | 7.9% |  | Val (V) | 24 | 6.3% |  | Trp (W) | 10 | 2.6% |  | Tyr (Y) | 13 | 3.4% Ala (A) | 20 | 5.3% Cys (C) | 12 | 3.2% Asp (D) | 10 | 2.6% Glu (E) | 19 | 5.0% Phe (F) | 12 | 3.2% Gly (G) | 20 | 5.3% His (H) | 11 | 2.9% Ile (I) | 16 | 4.2% Lys (K) | 24 | 6.3% Leu (L) | 34 | 9.0% Met (M) | 5 | 1.3% Asn (N) | 27 | 7.1% Pro (P) | 16 | 4.2% Gln (Q) | 17 | 4.5% Arg (R) | 21 | 5.5% Ser (S) | 16 | 4.2% Thr (T) | 30 | 7.9% Val (V) | 24 | 6.3% Trp (W) | 10 | 2.6% Tyr (Y) | 13 | 3.4%
Ala (A) | 20 | 5.3%
Cys (C) | 12 | 3.2%
Asp (D) | 10 | 2.6%
Glu (E) | 19 | 5.0%
Phe (F) | 12 | 3.2%
Gly (G) | 20 | 5.3%
His (H) | 11 | 2.9%
Ile (I) | 16 | 4.2%
Lys (K) | 24 | 6.3%
Leu (L) | 34 | 9.0%
Met (M) | 5 | 1.3%
Asn (N) | 27 | 7.1%
Pro (P) | 16 | 4.2%
Gln (Q) | 17 | 4.5%
Arg (R) | 21 | 5.5%
Ser (S) | 16 | 4.2%
Thr (T) | 30 | 7.9%
Val (V) | 24 | 6.3%
Trp (W) | 10 | 2.6%
Tyr (Y) | 13 | 3.4%
Total:  | 379
Theoretical 2D gel:

小红点:)

enter image description here


编辑:示例包含文件 - 文件必须在名为arquivo

的字段中发送
import requests
import lxml.html

url = 'http://pro-161-70.ib.unicamp.br/~itaraju/cgi-bin/itaraju/bioinf/pimw.cgi'
payload = {
#    'arquivo': '', # remove it
    'opShowTitle': 'ON',
    'opShowSeq': 'ON',
    'opShowStat': 'ON',
    'opShowpimw': 'ON',
    'opGelVirtual': 'ON',
    'opMap': 'gel0.def',
    'opPK': 'Default',
    'tbCt': 3.55,
    'tbNt': 7,
    'tbArg': 12.01,
    'tbAsp': 4.06,
    'tbCys': 9,
    'tbGlu': 4.45,
    'tbHis': 5.985,
    'tbLys': 10.01,
    'tbTyr': 10.01,
    'tbSeq': '',
}

files = {'arquivo': open('sequence.fasta').read()}

#url = 'http://httpbin.org/post' # special portal for tests

# send POST    
r = requests.post(url, data=payload, files=files)

#print r.text

# convert HTML string into HTML tree
html = lxml.html.fromstring(r.text)

# get all images
imgs = html.cssselect('img')

# get second image
if len(imgs) > 1:
    url = 'http://pro-161-70.ib.unicamp.br/~itaraju/cgi-bin/itaraju/bioinf/' + imgs[1].attrib['src'].strip()

    print "Downloading ...",  url

    with open('output.gif', 'wb') as handle:
        r = requests.get(url, stream=True)

        if not r.ok:
            # Something went wrong
            pass

        for block in r.iter_content(1024):
            if not block:
                break

            handle.write(block)
            print '.',

        print 

# get data
for tr in html.cssselect('tr'):
    for td in tr.cssselect('tr'):
        print td.text_content().strip().replace('\n', ' | '),
    print 

已使用的文件sequence.fasta

>gi|532319|pir|TVFV2E|TVFV2E envelope protein
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVH
CTNLMNTTVTTGLLLNGSYSENRTQIWQKHRTSNDS
ALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQ
KYNLRLRQAWCHFPSNWKGAWKEVKEEIVNLPKER
YRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPG
PCVQRTYVACHIRSVIIWLETISKKTYAPPREGHLECT
STVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRY
KLVEITPIGFAPTEVRRYTGGHERQKRVPFVXXXXXX
XXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL
LAAVEAQQQMLKLTIWGVK