Question

我熟悉比较2个整数和字符串列表；但是，在比较2个包含额外字符的字符串列表时，可能会遇到一些挑战。

假设输出包含以下内容，我将其分成字符串列表。我在代码中称其为diff。

输出

164c164
< Apples = 
---
> Apples = 0
168c168
< Berries = 
---
> Berries = false
218c218
< Cherries = 
---
> Cherries = 20
223c223
< Bananas = 
---
> Bananas = 10
233,234c233,234
< Lemons = 2
< Strawberries = 4
---
> Lemons = 4
> Strawberries = 2
264c264
< Watermelons = 
---
> Watermelons = 524288

第二组字符串包含我想与第一张列表进行比较的ignore变量。

>>> ignore
['Apples', 'Lemons']

我的代码：

>>> def str_compare (ignore, output):
...     flag = 0
...     diff = output.strip ().split ('\n')
...     if ignore:
...         for line in diff:
...             for i in ignore:
...                 if i in line:
...                     flag = 1
...             if flag:
...                 flag = 0
...             else:
...                 print (line)
... 
>>>

该代码适用于Apple和Lemons。

>>> str_compare(ignore, output)
164c164
---
168c168
< Berries = 
---
> Berries = false
218c218
< Cherries = 
---
> Cherries = 20
223c223
< Bananas = 
---
> Bananas = 10
233,234c233,234
< Strawberries = 4
---
> Strawberries = 2
264c264
< Watermelons = 
---
> Watermelons = 524288
>>>

必须有一个更好的方法来比较2个不是O（n ^ 2）的字符串。如果我的差异列表不包含“ Apples =”之类的多余字符，则可以使用O（n）比较两个列表。有什么建议或想法可以比较而无需遍历每个diff元素上的“ ignore”变量？

更新＃1 为了避免混淆并使用建议的注释，我已经更新了代码。

>>> def str_compare (ignore, output):
...     diff = output.strip ().split ('\n')
...     if ignore:
...         for line in diff:
...             if not any ([i in line for i in ignore]):
...                 print (line)
...                 print ("---")
>>>

无论如何，对于每个diff元素，它仍然会循环两次忽略。

Answer 1

为提高效率，请使用忽略集而非列表。使用split从行中获取关键字。

>>> def str_compare (ignore, output):
...     ignore = set (ignore)
...     diff = output.strip ().split ('\n')
...     for line in diff:
...         if line.startswith('<') or line.startswith('>'):
...             var = line.split () [1]
...             if var not in ignore:
...                 print (line)
...         else:
...             print (line)
...

输出

>>> str_compare (ignore, output)
164c164
---
168c168
< Berries = 
---
> Berries = false
218c218
< Cherries = 
---
> Cherries = 20
223c223
< Bananas = 
---
> Bananas = 10
233,234c233,234
< Strawberries = 4
---
> Strawberries = 2
264c264
< Watermelons = 
---
> Watermelons = 524288

您可以通过拆分和移至“ --- \ n”来消除对标志的需要（比标志或输入符号----稍微更通用的解决方案）

请注意，在s2最坏的情况下，字符串包含s1应该约为len（s1）* len（2），而对于max（len（s1），len（s2）则相等。而python实现是相当不错的（对于一般情况）），似乎存在线性复杂度算法http://monge.univ-mlv.fr/~mac/Articles-PDF/CP-1991-jacm.pdf 另请参见Algorithm to find multiple string matches

比较两个字符串列表

1 个答案: