如何从该URL获取氨基酸序列?

时间:2019-07-08 01:56:55

标签: python web-scraping

我想使用python和Selenium从url下方获取氨基酸序列,但无法成功。 http://flybase.org/download/sequence/FBgn0003719/FBpp

我已经尝试过u美丽的汤和硒。

from selenium import webdriver

driver = webdriver.Chrome()

driver.get('http://flybase.org/download/sequence/FBgn0003719/FBpp')

iframe = driver.find_element_by_class_name('scroller')

notification_element = driver.find_element_by_class_name('fastaSeq')

print(notification_element)
  

消息:没有这样的元素:无法找到元素

2 个答案:

答案 0 :(得分:0)

您可以使用selenium来加载页面,并使用BeautifulSoup来访问序列:

from selenium import webdriver
from bs4 import BeautifulSoup as soup
d = webdriver.Chrome('/path/to/chromedriver')
d.get('http://flybase.org/download/sequence/FBgn0003719/FBpp')
sequence = soup(d.page_source, 'html.parser').find('div', {'class':'fastaSeq'}).text

输出:

'MKGMRLMPMK MKAKLVVLSV GALWMMMFFL VDYAEGRRLS QLPESECDFD FKEQPEDFFG ILDSSLVPPK EPKDDIYQLK TTRQHSGRRR KQSHKSQNKA ALRLPPPFLW TDDAVDVLQH SHSPTLNGQP IQRRRRAVTV RKERTWDYGV IPYEIDTIFS GAHKALFKQA MRHWENFTCI KFVERDPNLH ANYIYFTVKN CGCCSFLGKN GNGRQPISIG RNCEKFGIII HELGHTIGFH HEHARGDRDK HIVINKGNIM RGQEYNFDVL SPEEVDLPLL PYDLNSIMHY AKNSFSKSPY LDTITPIGIP PGTHLELGQR KRLSRGDIVQ ANLLYKCASC GRTYQQNSGH IVSPHFIYSG NGVLSEFEGS GDAGEDPSAE SEFDASLTNC EWRITATNGE KVILHLQQLH LMSSDDCTQD YLEIRDGYWH KSPLVRRICG NVSGEVITTQ TSRMLLNYVN RNAAKGYRGF KARFEVVCGG DLKLTKDQSI DSPNYPMDYM PDKECVWRIT APDNHQVALK FQSFELEKHD GCAYDFVEIR DGNHSDSRLI GRFCGDKLPP NIKTRSNQMY IRFVSDSSVQ KLGFSAALML DVDECKFTDH GCQHLCINTL GSYQCGCRAG YELQANGKTC EDACGGVVDA TKSNGSLYSP SYPDVYPNSK QCVWEVVAPP NHAVFLNFSH FDLEGTRFHY TKCNYDYLII YSKMRDNRLK KIGIYCGHEL PPVVNSEQSI LRLEFYSDRT VQRSGFVAKF VIDVDECSMN NGGCQHRCRN TFGSYQCSCR NGYTLAENGH NCTETRCKFE ITTSYGVLQS PNYPEDYPRN IYCYWHFQTV LGHRIQLTFH DFEVESHQEC IYDYVAIYDG RSENSSTLGI YCGGREPYAV IASTNEMFMV LATDAGLQRK GFKATFVSEC GGYLRATNHS QTFYSHPRYG SRPYKRNMYC DWRIQADPES SVKIRFLHFE IEYSERCDYD YLEITEEGYS MNTIHGRFCG KHKPPIIISN SDTLLLRFQT DESNSLRGFA ISFMAVDPPE DSVGEDFDAV TPFPGYLKSM YSSETGSDHL LPPSRLI'

答案 1 :(得分:0)

使用在“网络”标签中找到的专用API,然后只需requests

import requests

r = requests.get('http://flybase.org/api/sequence/id/FBgn0003719/FBpp').json()
print(r['resultset']['result'][0]['sequence'])