我正在尝试在y6
内抓取文本<g class="jbfraglines">
,但从select
和find_all
中得到一个空白列表。
HTML(简体)
<form method="GET" enctype="application/x-www-form-urlencoded" action="peptide_view.pl">
<div id="xi:container">
<svg id="xi:svg-container" xmlns="http://www.w3.org/2000/svg" version="1.1" baseProfile="full" width="800" height="400" style="width: 800px; height: 400px; background: white; border: 1px solid black; overflow: hidden; cursor: default;">
<defs><filter x="0" y="0" width="100%" height="100%" id="opaqueBackground"><feFlood flood-color="#ffffff" flood-opacity="1" result="bg"></feFlood><feMerge><feMergeNode in="bg"></feMergeNode><feMergeNode in="SourceGraphic"></feMergeNode></feMerge></filter></defs>
<g class="view-label"><text text-anchor="start" x="742" y="352" transform="rotate(0)" id="id1" style="undefined" class="label">observed</text></g>
<g class="jbresidue"></g><g class="jbresidue"><text text-anchor="middle" x="35" y="60" dx="0" dy="0" transform="rotate(0 35,60)" id="id19" class="aa">A</text></g>
<g class="jbresidue"><text text-anchor="middle" x="55" y="60" dx="0" dy="0" transform="rotate(0 55,60)" id="id20" class="aa">A</text></g>
<g class="jbfraglines"><line x1="45" y1="67" x2="45" y2="37" id="id21" stroke="#000000" stroke-width="1"></line><line x1="45" y1="37" x2="55" y2="27" id="id22" stroke="#000000" stroke-width="1"></line><line x1="45" y1="67" x2="35" y2="77" id="id23" stroke="#999999" stroke-width="1" style="visibility: hidden;"></line>
<text text-anchor="start" x="45" y="30" dx="0" dy="0" transform="rotate(-45 45,30)" id="id24" fill="#000000">y6</text>
<text text-anchor="start" x="38" y="90" dx="0" dy="0" transform="rotate(-45 38,90)" id="id25" style="visibility: hidden;">b1</text></g>
…
我的代码
import requests
import urllib.request
import time
from bs4 import BeautifulSoup
...
print(soup.select('g[class="jbfraglines"]'))
>> []
print(soup.find_all('g[class="jbfraglines"]'))
>> []
由于g
位于<div[id="xi:container"]>
和<form method="GET" ...>
中,因此我尝试select
进行操作,但它们还返回了空白列表或错误。
print(soup.find_all('div[id="xi:container"]'))
>> []
print(soup.select('div[id="xi:container"]'))
>> UnicodeEncodeError: 'cp932' codec can't encode character '\xa0' in position 114: illegal multibyte sequence
print(soup.select('form'))
>> UnicodeEncodeError: 'cp932' codec can't encode character '\xa0' in position 114: illegal multibyte sequence
print(soup.find_all('form'))
>> UnicodeEncodeError: 'cp932' codec can't encode character '\xa0' in position 114: illegal multibyte sequence
response = requests.get(link)
返回了<Response [200]>
,我确信可以进入正确的页面。怎么了我需要做些什么来在form
或svg
内抓取文字吗?
我注意到的另一件事是此HTML具有<script type="text/javascript" src="../templates/peptide_view.js?2.006001"></script>
。我不确定问题是否与js
有关?。