我想从
中提取 1包,4包礼品套装,1支带橡皮的铅笔......[<span class="a-size-base">1 Pack</span>, <span class="a-size-base">4 Pack Gift Set</span>, <span class="a-size-base">1 Pencil with Erasers</span>, <span class="a-size-base">1 Pencil with Lead and Erasers</span>]
在python中。
谢谢
答案 0 :(得分:0)
最简单的方法是使用Beautiful Soup,事实上的 Python库来解析HTML。获取by downloading the source here或pip install bs4
。
from bs4 import BeautifulSoup
string = '[<span class="a-size-base">1 Pack</span>, <span class="a-size-base">4 Pack Gift Set</span>, <span class="a-size-base">1 Pencil with Erasers</span>, <span class="a-size-base">1 Pencil with Lead and Erasers</span>]'
# Represent the string as a nested data structure
soup = BeautifulSoup(string, "html.parser")
# Find all <span> tags in the BeautifulSoup object
spans = soup.find_all('span')
# Get the text inside the <span> tags
print([span.text for span in spans])
这将为您提供所需内容的列表:
['1 Pack', '4 Pack Gift Set', '1 Pencil with Erasers', '1 Pencil with Lead and Erasers']
答案 1 :(得分:0)
使用标准库re(正则表达式操作)。
for (Long id : ((Map< Long, ?>)mSomeMap).keySet())
输出为:1个装,4个礼品套装,1个带橡皮的铅笔,1个带铅和橡皮的铅笔
答案 2 :(得分:0)
您能详细说明您的问题和数据结构吗?假设您的数据结构是字符串列表:
import re
l = ['<span class="a-size-base">1 Pack</span>', '<span class="a-size-base">4 Pack Gift Set</span>', '<span class="a-size-base">1 Pencil with Erasers</span>', '<span class="a-size-base">1 Pencil with Lead and Erasers</span>']
print([re.match(r'<([a-zA-Z]+).+>(.+)</\1>', i).group(2) for i in l])