我有一组如下所示的链接:
links = ['http://www.website.com/category/subcategory/1',
'http://www.website.com/category/subcategory/2',
'http://www.website.com/category/subcategory/3',...]
我想从此列表中提取1
,2
,3
等,并将提取的数据存储在subcategory_explicit
中。它们存储为str
,但我无法使用以下代码访问它们:
subcategory_explicit = [cat.get('subcategory') for cat in links if cat.get('subcategory') is not None]
我是否必须将数据类型从str
更改为其他内容?获取和存储提取的值的更好方法是什么?
答案 0 :(得分:1)
subcategory_explicit = [i[i.find('subcategory'):] for i in links if 'subcategory' in i]
这通过切片使用子字符串,从“子类别”中的“s”开始直到字符串结束。通过将len('subcategory')
添加到find
的值,您可以排除“子类别”并获取“/#”(其中#是任意数字)。
答案 1 :(得分:1)
试试这个(使用re模块):
import re
links = [
'http://www.website.com/category/subcategory/1',
'http://www.website.com/category/subcategory/2',
'http://www.website.com/category/subcategory/3']
d = "|".join(links)
# 'http://www.website.com/category/subcategory/1|http://www.website.com/category/subcategory/2|http://www.website.com/category/subcategory/3'
pattern = re.compile("/category/(?P<category_name>\w+)/\d+", re.I)
subcategory_explicit = pattern.findall(d)
print(subcategory_explicit)