Python从字符串列表中删除字符串列表

时间:2016-08-31 19:56:47

标签: python list

我试图从网址列表中删除多个字符串。我有超过300k的网址,我试图找到原始版本的变体。这是我一直在使用的玩具示例。

URLs = ['example.com/page.html',
        'www.example.com/in/page.html',
        'example.com/ca/fr/page.html',
        'm.example.com/de/page.html',
        'example.com/fr/page.html']

locs = ['/in', '/ca', '/de', '/fr', 'm.', 'www.']

我最不想要的是没有语言或位置的网页列表:

desired_output = ['example.com/page.html',
                  'example.com/page.html',
                  'example.com/page.html',
                  'example.com/page.html',
                  'example.com/page.html']

我尝试过列表理解并嵌套for循环,但还没有任何工作。有人可以帮忙吗?

# doesn't remove anything
for item in URLs:
    for string in locs:
        re.sub(string, '', item)

# doesn't remove anything
for item in URLs:
    for string in locs:
        item.strip(string)

# only removes the last string in locs
clean = []
for item in URLs:
    for string in locs:
        new = item.replace(string, '')
    clean.append(new)

3 个答案:

答案 0 :(得分:4)

您必须再次将replace的结果分配给item

clean = []
for item in URLs:
    for loc in locs:
        item = item.replace(loc, '')
    clean.append(item)

或简称:

clean = [
    reduce(lambda item,loc: item.replace(loc,''), [item]+locs)
    for item in URLs
]

答案 1 :(得分:3)

您遇到的最大问题是您没有保存返回值。

urls = ['example.com/page.html',
        'www.example.com/in/page.html',
        'example.com/ca/fr/page.html',
        'm.example.com/de/page.html',
        'example.com/fr/page.html']

locs = ['/in', '/ca', '/de', '/fr', 'm.', 'www.']

stripped = list(urls) ## create a new copy, not necessary

for loc in locs:
    stripped = [url.replace(loc, '') for url in stripped]

在此之后,stripped等于

['example.com/page.html',
 'example.com/page.html',
 'example.com/page.html',
 'example.com/page.html',
 'example.com/page.html']

修改

或者,如果不创建新列表,则可以执行

for loc in locs:
    urls = [url.replace(loc, '') for url in urls]

在此之后,urls等于

['example.com/page.html',
 'example.com/page.html',
 'example.com/page.html',
 'example.com/page.html',
 'example.com/page.html']

答案 2 :(得分:2)

您可以先将删除部分抽象为函数,然后使用列表解析:

def remove(target, strings):
    for s in strings:
        target = target.replace(s,'')
    return target

URLs = ['example.com/page.html',
        'www.example.com/in/page.html',
        'example.com/ca/fr/page.html',
        'm.example.com/de/page.html',
        'example.com/fr/page.html']

locs = ['/in', '/ca', '/de', '/fr', 'm.', 'www.']

用过:

URLs = [remove(url,locs) for url in URLs]

for url in URLs: print(url)

输出:

example.com/page.html
example.com/page.html
example.com/page.html
example.com/page.html
example.com/page.html