Question

堆栈上的第一个问题，我是关于hackerrank的前几个挑战，并坚持“有多少删除来对抗2个字”。我在网上看到了其他一些解决方案，但我无法弄清楚为什么我的速度要慢得多。我似乎有一个“正确”的算法，因为我计算了一些测试用例并找到了相应的预期输出

def number_needed(a, b):
    count = 0
    isFound = False
    matchedBs = []

    for letterA in a:
        for j,letterB in enumerate(b):
            if letterA == letterB and (j not in matchedBs):
                isFound = True
                matchedBs.append(j)
                break
        if not isFound:
            count += 1
        isFound = False

    return count + (len(b)-len(matchedBs))      

a = input().strip()
b = input().strip()
print(number_needed(a, b))

所以我试图弄清楚我的算法的一般概念是否是瓶颈，或者它是否是一些错误。谢谢！

Answer 1

您可以在此处使用line profiling。您可以使用conda install line_profiler。

首先将您的函数合并到脚本中并使用@profile进行装饰。这是脚本：

# number_needed.py

from string import ascii_letters
import numpy as np

@profile
def number_needed(a, b):
    count = 0
    isFound = False
    matchedBs = []

    for letterA in a:
        for j,letterB in enumerate(b):
            if letterA == letterB and (j not in matchedBs):
                isFound = True
                matchedBs.append(j)
                break
        if not isFound:
            count += 1
        isFound = False

    return count + (len(b)-len(matchedBs))

np.random.seed(123)
s1 = ''.join(np.random.choice(list(ascii_letters), size=500).tolist())
s2 = ''.join(np.random.choice(list(ascii_letters), size=500).tolist())

def main():
    return number_needed(s1, s2)

if __name__ == '__main__':
    main()

然后在IPython / JupyterQt中调用以下命令。您可能需要根据目录的内容更改路径：

%run C:/Users/YOURNAME/Anaconda3/pkgs/line_profiler-2.0-py36_0/Lib/site-packages/kernprof.py -l -v number_needed.py

结果显示了一些有用的统计数据。

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     5                                           @profile
     6                                           def number_needed(a, b):
     7         1            5      5.0      0.0      count = 0
     8         1            3      3.0      0.0      isFound = False
     9         1            2      2.0      0.0      matchedBs = []
    10                                           
    11       501          468      0.9      0.2      for letterA in a:
    12    136321       133965      1.0     46.4          for j,letterB in enumerate(b):
    13    136229       151929      1.1     52.6              if letterA == letterB and (j not in matchedBs):
    14       408          371      0.9      0.1                  isFound = True
    15       408          585      1.4      0.2                  matchedBs.append(j)
    16       408          425      1.0      0.1                  break
    17       500          472      0.9      0.2          if not isFound:
    18        92          105      1.1      0.0              count += 1
    19       500          459      0.9      0.2          isFound = False
    20                                           
    21         1            5      5.0      0.0      return count + (len(b)-len(matchedBs))

看起来嵌套的for j,letterB in enumerate(b):是你的罪魁祸首。您正在评估136,000次以下的线。也就是说，你在循环中运行的操作只需要每次点击 ，但是它们几乎都没有被评估过，所以整体而言它们并不会耗费你的时间。

不过，运行时似乎并不太糟糕。我机器上的s1和s2为14.7毫秒。

Answer 2

你的代码有O（n ³）复杂度（n是a和b的长度）：你循环a中的每个字符，比较那些使用b中的每个字符，然后检查该索引是否在已匹配字符的列表中，该字符也具有线性复杂性。

作为快速修复，您可以使matchedBs成为set，从而将复杂度降低到O（n²）。但是你可以做得更好：只计算a和b中的所有单个字符。不要使用str.count，否则你会再次使用O（n²）;相反，使用dict映射字符到他们的计数，循环a和b一次，并相应地更新这些计数。最后，只需将a和b的计数差异相加。

或者，使用Python的库，您可以为a和b创建两个collections.Counter并进行比较。

hackerrank“制作Anagram”挑战超时失败

2 个答案: