从p标签中提取可变数据

时间:2018-04-17 16:39:34

标签: python python-3.x beautifulsoup

如何使用python从下面的代码中提取249.30 251.50 252.55 246.80 248.20(假设数字的位数是变量,即代替249.30我可以说2.4或2490.30)?

    <html>
    <body>
    <p>
     BSE##B#As on 17 Apr 18 | 16:00@C#7@P#@HL#249.30,251.50,252.55,246.80,248.20,Listed
    </p>
    </body>
   </html>

2 个答案:

答案 0 :(得分:1)

使用BeautifulSoup

<强>演示:

s = """<html>
    <body>
    <p>
     BSE##B#As on 17 Apr 18 | 16:00@C#7@P#@HL#249.30,251.50,252.55,246.80,248.20,Listed
    </p>
    </body>
   </html>"""

from bs4 import BeautifulSoup
soup = BeautifulSoup(s, "html.parser")
print(soup.find("p").text)
print(re.findall("\d+\.\d+" ,soup.find("p").text))     

<强>输出:

BSE##B#As on 17 Apr 18 | `16:00@C#7@P#@HL#249.30,251.50,252.55,246.80,248.20,Listed`
[u'249.30', u'251.50', u'252.55', u'246.80', u'248.20']

答案 1 :(得分:-1)

以下正则表达式应匹配这些数字:(\d+[\.])?\d+

import re

regex = re.compile('(\d+[\.])?\d+')
print(regex.match(content))