从表格中抓取数据

时间:2019-07-12 20:53:52

标签: python selenium

首先,我尝试使用bs4,但是表不是纯HTML文本,这就是为什么我移至硒的原因

我正在尝试抓取表格数据,但是我不知道如何获取信息。

我现在拥有的是:

table =  browser.find_element_by_id("name_list")  
cell = table.find_elements_by_xpath("//td[@style='text-align:center']")

表数据显示如下:

<td style="text-align:center" class="left"><script   
type="text/javascript">document.write(Base64.decode("MTA0LjI0OC4xMTUuMjM2"))</script>"John"</td>

我想得到“约翰”,但我怎么能得到它?

2 个答案:

答案 0 :(得分:1)

您可以使用BeautifulSoup来完成

如果<script>中有<td>,则可以使用迭代器.children并获取第二个/最后一个元素(第一个元素为<script>

from bs4 import BeautifulSoup as BS

html = '''<td style="text-align:center" class="left"><script   
type="text/javascript">document.write(Base64.decode("MTA0LjI0OC4xMTUuMjM2"))</script>"John"</td>'''

soup = BS(html, 'html.parser')
td = soup.find('td')

text = list(td.children)[1]

print(text) # John

或者您可以找到<script>extract,因此只有文字的<td>

from bs4 import BeautifulSoup as BS

html = '''<td style="text-align:center" class="left"><script   
type="text/javascript">document.write(Base64.decode("MTA0LjI0OC4xMTUuMjM2"))</script>"John"</td>'''

soup = BS(html, 'html.parser')
td = soup.find('td')

td.find('script').extract()
text = td.text

print(td.text) # John

如果您需要Base64.decode("MTA0LjI0OC4xMTUuMjM2")中的文本,则可以找到<script>并将其作为文本获取。使用切片,您可以获得文本MTA0LjI0OC4xMTUuMjM2并使用模块base64进行解码。您会收到文本104.248.115.236

from bs4 import BeautifulSoup as BS
import base64

html = '''<td style="text-align:center" class="left"><script   
type="text/javascript">document.write(Base64.decode("MTA0LjI0OC4xMTUuMjM2"))</script>"John"</td>'''

soup = BS(html, 'html.parser')
td = soup.find('td')

script = td.find('script').text

text = script[30:-3]

text = base64.b64decode(text).decode()

print(text) # 104.248.115.236

答案 1 :(得分:0)

您可以使用下面的行获取文本。

Data info
File name: D:\(path to file)
Start time: 6/26/2019 15:39:54.222
Number of channels: 3
Sample rate: 1E6
Store type: fast on trigger
Post time: 20
Global header information: from DEWESoft
Comments: 

Events
Event Type  Event   Time    Comment
1   storing started at  7.237599    
2   storing stopped at  7.257599    


Data1
Time    Incidente   Transmitida DI 6    
s   um/m    um/m    -   
0   2.1690152   140.98599   1
1E-6    2.1690152   140.98599   1
2E-6    4.3380303   145.32402   1
3E-6    4.3380303   145.32402   1
4E-6    -2.1690152  145.32402   1

确保xkb中存在,以将范围限制为当前表节点。