我有一些txt文件包含很多这样的网址:
www.example.com/spare_parts/M2541.htm
www.example.com/spare_parts/M3511.htm
www.example.com/spare_parts/C6501.htm
www.example.com/spare_parts/M2800.htm
www.example.com/custom_parts/M1808.htm
www.example.com/custom_parts/R2202.htm
我想要的是一个排序的链接列表。我已经设法加载我的txt文件并在python中逐行读取,但我无法对它进行排序,因为我发现的所有示例都在寻找单词,在这种情况下,我想要所有以M.开头的备件的链接。有人帮帮我吗?
for line in text:
if 'spare_parts' in line:
print texto2(line)
else:
print texto3(line)
答案 0 :(得分:1)
parts=[]
FirstLetter='M'
fp=open('textfile.txt')
for line in fp:
if 'spare_parts' in line:
part =line.rstrip().split('/')[-1].strip('.htm')
if part.startswith(FirstLetter):
parts.append(part)
print sorted(parts)
输出:
['M2541','M2800','M3511']
答案 1 :(得分:0)
如果我理解你的问题,你正在寻找一个过滤器:
urlList = [
"www.example.com/spare_parts/M2541.htm",
"www.example.com/spare_parts/M3511.htm",
"www.example.com/spare_parts/C6501.htm",
"www.example.com/spare_parts/M2800.htm",
"www.example.com/custom_parts/M1808.htm",
"www.example.com/custom_parts/R2202.htm"
]
sparePartsStartingWithMList = [line for line in urlList if ("/spare_parts/M" in line)]
重要的一行是最后一行,它是一个列表理解,返回一个包含所有字符串的列表,其中包含“/ spare_parts / M”。等效循环是这样的:
sparePartsStartingWithMList = []
for line in urlList:
if ("/spare_parts/M" in line:
sparePartsStartingWithMList.append(line)
由于您的网址的性质,所有备件都在/spare_parts
目录下,所有备件的标识都是文件名。因此,您只需搜索路径的相关部分,即文件夹后跟目录分隔符(/
),然后搜索部件标识的第一个字母。
sparePartsStartingWithMList
的价值:
sparePartsStartingWithMList = [
"www.example.com/spare_parts/M2541.htm",
"www.example.com/spare_parts/M3511.htm",
"www.example.com/spare_parts/M2800.htm"
]