Question

我有一组字符串作为学生的测试答案和答案键。

Student answers: ABDEDCAB
Answer key:      ABCCCABB

我想返回一个包含 0 和 1 的列表，当学生错过答案时为 0，正确答案为 1。

Return = [1,1,0,0,0,0,0,1]

这里的问题不是找到解决方案，而是找到一个有效的解决方案。每个学生有 180 个答案，有 600 万学生，我尝试做的任何迭代都是永远的。

%%time
for j in dfw['CN_respostas'].head(10000).index:
    lista = []
    for i in range(len(dfw['CN_respostas'][j])):
        if dfw['CN_respostas'][0][i] == dfw['CN_gabarito'][0][i]:
            lista.append(1)
        else:
            lista.append(0)

在 45 个答案中只有 10000 次迭代给出了这个结果：

CPU times: user 14.8 s, sys: 396 ms, total: 15.2 s
Wall time: 17.6 s

谢谢

Answer 1

最直接（在我的测试中也是最快）的方法可能就是使用 zip 进行列表推导：

import time

a = 'ABDEDCAB'
b = 'ABCCCABB'


start = time.time()
for i in range(10000000):
    c = [x == y for x, y in zip(a, b)]
end = time.time()
print(end - start)

在我的机器上运行大约 9.5 秒，对 8 个答案进行 1000 万次比较。

编辑：我现在已经测试了 600 万组 180 个答案，在我的机器上运行时间约为 75 秒。

Answer 2

您可以为此使用 zip：

s1 = 'ABDEDCAB'
s2 = 'ABCCCABB'

[int(x==y) for x,y in zip(s1,s2)]
# Out[70]: [1, 1, 0, 0, 0, 0, 0, 1]

在数据框中应用：

def func(s1,s2):
    return [int(x==y) for x,y in zip(s1,s2)]

df.apply(lambda row: func(row['solution'], row['answer']), axis=1)

比较两个字符串并在 Python 中有效地将差异返回为 0 和 1

2 个答案: