Question

这是使用python从webource提取的xml或html web提取的数据，它是表格格式，我希望只将** **标记的数据放在数组中作为[] []如何做同样的事情？单个阵列也可以一个接一个地存储。

我的想法是将符号BHEL及其值80.50作为单个元素，以便我可以将其用于编码。

<table width="100%"><tr><td>
<div class="tphead"><h2>Option Chain (Equity Derivatives)</h2></div>
</td><td align="right">
<div style="float:right; font-size:1.2em;">
<span>**Underlying Stock:** <b style="font-size:1.2em;">**BHEL** **80.50**</b> </span>
<span>**As on May 11, 2018 15:30:30 IST**<a> <img onclick="refresh();" src="/live_market/resources/images/refressbtn.gif" style="cursor: pointer" title="refresh"/></a></span></div>
</td></tr></table>

我想只过滤这些数据并逐个存储它。

，数组如下。这里可以提供任何python代码支持。

Option Chain (Equity Derivatives)
Underlying Stock: BHEL 80.50
As on
May 11, 2018
15:30:30 IST

Answer 1

目前还不是很清楚你需要什么，但看起来你想使用BeautifulSoup4在HTML标签中获取文本。

from bs4 import BeautifulSoup

extracted_text = []
soup = BeautifulSoup(your_string, 'html.parser')
for tag in soup.find_all(recursive=False):
    text = tag.text.strip()
    if text:
        extracted_text.append(text)

your_string 是你提取的html代码

recursive = False 用于在嵌套HTML标记上仅向下一级，否则它将提取相同的文本两次（或更多）

从网站抓取数据并将其存储在数组中

1 个答案: