我有一个排除的子串包含文本文件,我想迭代检查并返回输入项而不排除子串。
这里我使用python 2.4以下代码实现此目的,因为with open
和any
不起作用。但我必须硬编码子串:
工作代码:
def filterJunk(x):
return [i for i in x if not ('I' in i or 'am' in i or '#junk' in i)]
OutPutList = []
for x in InputstringsList:
OutPutList.append(filterJunk(x))
问题:
但是如果我需要排除n个子串呢?显然,我不会在代码中对它们进行硬编码,因此我想要一种替代方法,以便我可以从排除文件中读取每个子字符串并断言最终值不包含相同的
如果我在值中找不到文本文件中的子字符串,则返回一个值。
例如
处理:
InputstringsList = ["Icantbeapart_of_yourlist_asI_am_in_Junk", "youAre", "MeeToooo", "#junk"]
预期输出
OutPutList = ["youAre", "MeeToooo"]
答案 0 :(得分:1)
您可以使用辅助函数和内置filter()
函数:
def filterJunk(x, excluded):
def not_excluded(s):
for ex in excluded:
if ex in s:
return False
return True
return filter(not_excluded, x)
stringsList = ["Icantbeapart_of_yourlist_asI_am_in_Junk", "youAre", "MeeToooo", "#junk"]
excluded = 'I', 'am', '#junk'
print filterJunk(stringsList, excluded) # -> ['youAre', 'MeeToooo']
<强>分析强>
由于我的答案远远超过一行代码,因此您可以合理地预期它会慢于&#34; one-liners&#34;例如@AGN Gazer's answer中的两个。这是否真实取决于您使用的Python版本。这可以通过基准标记不同的算法来看出,我刚刚使用了一些执行时序测试代码I threw-together)。
对于Python 2.7.14,结果如下:
Fastest to slowest execution speeds using Python 2.7.14
(10,000 executions, best of 3 repetitions)
AGN Gazer 1 : 0.009705 secs, rel speed 1.00x, 0.00% slower
martineau : 0.012495 secs, rel speed 1.29x, 28.74% slower
AGN Gazer 2 : 0.045498 secs, rel speed 4.69x, 368.79% slower
但它们与Python 3.6.3完全不同:
Fastest to slowest execution speeds using Python 3.6.3
(10,000 executions, best of 3 repetitions)
martineau : 0.003329 secs, rel speed 1.00x, 0.00% slower
AGN Gazer 1 : 0.017841 secs, rel speed 5.36x, 435.99% slower
AGN Gazer 2 : 0.034160 secs, rel speed 10.26x, 926.29% slower
答案 1 :(得分:1)
单行(不一定是最理想的事情):
[x for x in stringsList if not [e for e in excluded if e in x]]
或
from itertools import dropwhile
[x for x in stringsList if not list(dropwhile(lambda t: t not in x, excluded))]