BeautifulSoup,不使用find_all()查找第n个表

时间:2019-01-24 22:55:12

标签: python beautifulsoup wikipedia

我想使用BeautifulSoup查找第n个表。到目前为止,这已经为我完成了工作。

table = soup.find_all('table',{'class':'wikitable sortable jquery-tablesorter'})[nth]

但是,如果我确定是我定义n的第n个表,是否有一种方法可以避免搜索和保存所有以前的表?我觉得如果有一种方法只能在第n个表的情况下获取表,我的代码将运行得更快。这些表来自维基百科。

1 个答案:

答案 0 :(得分:0)

'c:\Users\Admin\Documents\tcpmaster_PLC.py' 2019-01-25 10:15:16,072 INFO tcpmaster_PLC.main MainThread connected 2019-01-25 10:15:16,074 DEBUG tcpmaster_PLC.on_before_connect MainThread on_before_connect 192.168.0.150 502 Traceback (most recent call last): File "c:\Users\Admin\.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\ptvsd_launcher.py", line 45, in <module> main(ptvsdArgs) File "c:\Users\Admin\.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd\__main__.py", line 265, in main wait=args.wait) File "c:\Users\Admin\.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd\__main__.py", line 258, in handle_args debug_main(addr, name, kind, *extra, **kwargs) File "c:\Users\Admin\.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd\_local.py", line 45, in debug_main run_file(address, name, *extra, **kwargs) File "c:\Users\Admin\.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd\_local.py", line 79, in run_file run(argv, addr, **kwargs) File "c:\Users\Admin\.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd\_local.py", line 140, in _run _pydevd.main() File "c:\Users\Admin\.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd\_vendored\pydevd\pydevd.py", line 1925, in main debugger.connect(host, port) File "c:\Users\Admin\.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd\_vendored\pydevd\pydevd.py", line 1283, in run return self._exec(is_module, entry_point_fn, module_name, file, globals, locals) File "c:\Users\Admin\.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd\_vendored\pydevd\pydevd.py", line 1290, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "c:\Users\Admin\.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd\_vendored\pydevd\_pydev_imps\_pydev_execfile.py", line 25, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "c:\Users\Admin\Documents\tcpmaster_PLC.py", line 78, in <module> main() File "c:\Users\Admin\Documents\tcpmaster_PLC.py", line 63, in main logger.info(master.execute(1, cst.READ_DISCRETE_INPUTS, 0, 64)) #type ----> Tuple File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\modbus_tk\utils.py", line 39, in new raise excpt File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\modbus_tk\utils.py", line 37, in new ret = fcn(*args, **kwargs) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\modbus_tk\modbus.py", line 298, in execute response = self._recv(expected_length) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\modbus_tk\modbus_tcp.py", line 216, in _recv rcv_byte = self._sock.recv(1) ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host Exception ignored in: <bound method Master.__del__ of <modbus_tk.modbus_tcp.TcpMaster object at 0x00000253FD7043C8>> Traceback (most recent call last): File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\modbus_tk\modbus.py", line 90, in __del__ File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\modbus_tk\modbus.py", line 105, in close File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\modbus_tk\modbus_tcp.py", line 179, in _do_close TypeError: 'NoneType' object is not callable .select一起使用。我不确定这是否会使您的代码运行更快,为此,请查看文档的improving performance部分。

nth-of-type

输出

from bs4 import BeautifulSoup
html="""
<table class="1">
</table>
<table class="2">
</table>
<table class="3">
</table>
<table class="4">
</table>
<table class="5">
</table>
"""
soup=BeautifulSoup(html,'html.parser')
print(soup.select('table:nth-of-type(3)'))

css选择器[<table class="3"> </table>] 似乎不适用于BeautifulSoup。但是,如果您知道表的父类,则可以执行类似.class:nth-of-type(n)

的操作
'.parent table:nth-of-type(n)'

输出

from bs4 import BeautifulSoup
html="""
<div class="parent1">
<table class="tbl">
not our table 1
</table>
<table class="tbl">
not out table 2
</table>
</div>
<div class="parent2">
<table class="tbl">
our table 1
</table>
<table class="tbl">
our table 2
</table>
</div>
"""
soup=BeautifulSoup(html,'html.parser')
print(soup.select('.parent2 table:nth-of-type(2)'))

以上输出也可以通过[<table class="tbl"> our table 2 </table>]

完成