Question

我想从

中提取 1包，4包礼品套装，1支带橡皮的铅笔......

[<span class="a-size-base">1 Pack</span>, <span class="a-size-base">4 Pack Gift Set</span>, <span class="a-size-base">1 Pencil with Erasers</span>, <span class="a-size-base">1 Pencil with Lead and Erasers</span>]

在python中。

谢谢

Answer 1

最简单的方法是使用Beautiful Soup，事实上的 Python库来解析HTML。获取by downloading the source here或pip install bs4。

from bs4 import BeautifulSoup

string = '[<span class="a-size-base">1 Pack</span>, <span class="a-size-base">4 Pack Gift Set</span>, <span class="a-size-base">1 Pencil with Erasers</span>, <span class="a-size-base">1 Pencil with Lead and Erasers</span>]'

# Represent the string as a nested data structure
soup = BeautifulSoup(string, "html.parser")
# Find all <span> tags in the BeautifulSoup object
spans = soup.find_all('span')
# Get the text inside the <span> tags
print([span.text for span in spans])

这将为您提供所需内容的列表：

['1 Pack', '4 Pack Gift Set', '1 Pencil with Erasers', '1 Pencil with Lead and Erasers']

Answer 2

使用标准库re（正则表达式操作）。

for (Long id : ((Map< Long, ?>)mSomeMap).keySet())

输出为：1个装，4个礼品套装，1个带橡皮的铅笔，1个带铅和橡皮的铅笔

Answer 3

您能详细说明您的问题和数据结构吗？假设您的数据结构是字符串列表：

import re
l = ['<span class="a-size-base">1 Pack</span>', '<span class="a-size-base">4 Pack Gift Set</span>', '<span class="a-size-base">1 Pencil with Erasers</span>', '<span class="a-size-base">1 Pencil with Lead and Erasers</span>']
print([re.match(r'<([a-zA-Z]+).+>(.+)</\1>', i).group(2) for i in l])

在python中提取Span标记的内容

3 个答案: