首先,我尝试使用bs4,但是表不是纯HTML文本,这就是为什么我移至硒的原因
我正在尝试抓取表格数据,但是我不知道如何获取信息。
我现在拥有的是:
table = browser.find_element_by_id("name_list")
cell = table.find_elements_by_xpath("//td[@style='text-align:center']")
表数据显示如下:
<td style="text-align:center" class="left"><script
type="text/javascript">document.write(Base64.decode("MTA0LjI0OC4xMTUuMjM2"))</script>"John"</td>
我想得到“约翰”,但我怎么能得到它?
答案 0 :(得分:1)
您可以使用BeautifulSoup来完成
如果<script>
中有<td>
,则可以使用迭代器.children
并获取第二个/最后一个元素(第一个元素为<script>
)
from bs4 import BeautifulSoup as BS
html = '''<td style="text-align:center" class="left"><script
type="text/javascript">document.write(Base64.decode("MTA0LjI0OC4xMTUuMjM2"))</script>"John"</td>'''
soup = BS(html, 'html.parser')
td = soup.find('td')
text = list(td.children)[1]
print(text) # John
或者您可以找到<script>
和extract
,因此只有文字的<td>
from bs4 import BeautifulSoup as BS
html = '''<td style="text-align:center" class="left"><script
type="text/javascript">document.write(Base64.decode("MTA0LjI0OC4xMTUuMjM2"))</script>"John"</td>'''
soup = BS(html, 'html.parser')
td = soup.find('td')
td.find('script').extract()
text = td.text
print(td.text) # John
如果您需要Base64.decode("MTA0LjI0OC4xMTUuMjM2")
中的文本,则可以找到<script>
并将其作为文本获取。使用切片,您可以获得文本MTA0LjI0OC4xMTUuMjM2
并使用模块base64
进行解码。您会收到文本104.248.115.236
from bs4 import BeautifulSoup as BS
import base64
html = '''<td style="text-align:center" class="left"><script
type="text/javascript">document.write(Base64.decode("MTA0LjI0OC4xMTUuMjM2"))</script>"John"</td>'''
soup = BS(html, 'html.parser')
td = soup.find('td')
script = td.find('script').text
text = script[30:-3]
text = base64.b64decode(text).decode()
print(text) # 104.248.115.236
答案 1 :(得分:0)
您可以使用下面的行获取文本。
Data info
File name: D:\(path to file)
Start time: 6/26/2019 15:39:54.222
Number of channels: 3
Sample rate: 1E6
Store type: fast on trigger
Post time: 20
Global header information: from DEWESoft
Comments:
Events
Event Type Event Time Comment
1 storing started at 7.237599
2 storing stopped at 7.257599
Data1
Time Incidente Transmitida DI 6
s um/m um/m -
0 2.1690152 140.98599 1
1E-6 2.1690152 140.98599 1
2E-6 4.3380303 145.32402 1
3E-6 4.3380303 145.32402 1
4E-6 -2.1690152 145.32402 1
确保xkb中存在。,以将范围限制为当前表节点。