在文件中查找特定链接

时间:2014-01-23 18:44:32

标签: python

我有一些txt文件包含很多这样的网址:

www.example.com/spare_parts/M2541.htm
www.example.com/spare_parts/M3511.htm
www.example.com/spare_parts/C6501.htm
www.example.com/spare_parts/M2800.htm
www.example.com/custom_parts/M1808.htm
www.example.com/custom_parts/R2202.htm

我想要的是一个排序的链接列表。我已经设法加载我的txt文件并在python中逐行读取,但我无法对它进行排序,因为我发现的所有示例都在寻找单词,在这种情况下,我想要所有以M.开头的备件的链接。有人帮帮我吗?

for line in text:
    if 'spare_parts' in line:
        print texto2(line)
    else:
        print texto3(line)

2 个答案:

答案 0 :(得分:1)

parts=[]
FirstLetter='M'
fp=open('textfile.txt')
for line in fp:
    if 'spare_parts' in line:
        part =line.rstrip().split('/')[-1].strip('.htm')
        if part.startswith(FirstLetter):
            parts.append(part)

print sorted(parts)

输出: ['M2541','M2800','M3511']

答案 1 :(得分:0)

如果我理解你的问题,你正在寻找一个过滤器:

urlList = [
    "www.example.com/spare_parts/M2541.htm",
    "www.example.com/spare_parts/M3511.htm",
    "www.example.com/spare_parts/C6501.htm",
    "www.example.com/spare_parts/M2800.htm",
    "www.example.com/custom_parts/M1808.htm",
    "www.example.com/custom_parts/R2202.htm"
]
sparePartsStartingWithMList = [line for line in urlList if ("/spare_parts/M" in line)]

重要的一行是最后一行,它是一个列表理解,返回一个包含所有字符串的列表,其中包含“/ spare_parts / M”。等效循环是这样的:

sparePartsStartingWithMList = []

for line in urlList:
    if ("/spare_parts/M" in line:
        sparePartsStartingWithMList.append(line)

由于您的网址的性质,所有备件都在/spare_parts目录下,所有备件的标识都是文件名。因此,您只需搜索路径的相关部分,即文件夹后跟目录分隔符(/),然后搜索部件标识的第一个字母。

sparePartsStartingWithMList的价值:

sparePartsStartingWithMList = [
    "www.example.com/spare_parts/M2541.htm",
    "www.example.com/spare_parts/M3511.htm",
    "www.example.com/spare_parts/M2800.htm"
]