Question

我是python的新手，正尝试从页面中提取表，但是我无法使用BS4找到该表。您能告诉我我要去哪里了吗？

data.forEach(movie -> movie.getActors().replaceAll(m -> m.replace(realName, stageName)));

Answer 1

soup.find（'table'）提供以下内容

 <table style="margin: auto;">
<tbody>
<tr>
<td><a class="selected" href="">INK CHART :</a></td>
<td style="width: 220px;">
<input class="selector" id="searchbox" name="selector" onblur="clearTimeout(ctime)" onfocus="init()" onkeyup="refresh()" style="width: 100%" type="text"/>
<iframe border="0" frameborder="0" id="DivShim" style="display:none;position: absolute;"></iframe>
<div class="sbox" id="sbox" style="z-index: 10; ">
</div>
</td>
<td style="width: 120px;">
<select class="up" name="type" style="width: 100%">
<option value="can">Candle-Stick</option>
<option value="po">Point and Figure</option>
<option value="fundamentals">Fundamentals</option>
</select></td>
<td>
<input class="search_button" type="submit" value=""/>
</td>
</tr>
</tbody>
</table>

由于这不是您想要的，因此您将需要使用硒或飞溅。 https://selenium-python.readthedocs.io/

Answer 2

该内容是通过返回json的POST请求动态添加的。它要求通过cookie和标头进行身份验证。使用硒可能更简单。通过id抓取元素，并将其externalHTML传递给read_html以转换为漂亮的表输出

from selenium import webdriver
import pandas as pd

d = webdriver.Chrome()
d.get('https://chartink.com/screener/copy-supertrend-negative-breakout-1103')
table = pd.read_html(d.find_element_by_id('DataTables_Table_0').get_attribute('outerHTML'))[0]
print(table)

Python Beautiful Soup使用类解析表

2 个答案: