在Python 2.7之前找到基于文本的数字

时间:2013-10-16 13:06:04

标签: python html string python-2.7

如何使用python从以下代码段中提取34980和100329:

<tr id="product_34980" class="even">
<tr id="variant_100329" class="variantRow">

2 个答案:

答案 0 :(得分:3)

使用filterstr.isdigit,以下代码从每行提取数字。

>>> lines = '''<tr id="product_34980" class="even">
... <tr id="variant_100329" class="variantRow">
... '''
>>> [filter(str.isdigit, line) for line in lines.splitlines()]
['34980', '100329']

更新使用lxml

import lxml.html

html_string = '''
<tr id="product_34980" class="even">
<tr id="variant_100329" class="variantRow">
'''

root = lxml.html.fromstring(html_string)
for tr in root.cssselect('tr.even, tr.variantRow'):
    print(tr.get('id')) # => product_34980
    print(tr.get('id').rsplit('_', 1)[-1]) # => 34980

答案 1 :(得分:0)

不是最通用的解决方案,但它适用于上面的代码段:

import re

html = """
    <tr id="product_34980" class="even">
    <tr id="variant_100329" class="variantRow">
"""

ids = re.findall(r'id="\w+_(\d+)"', html)