在编写33 <= cp <= 47
与cp >= 33 and cp <= 47
之类的东西时有区别吗?
更具体地说,如果有一个函数可以做到:
def _is_punctuation(char):
"""Checks whether `chars` is a punctuation character."""
cp = ord(char)
if ((cp >= 33 and cp <= 47) or (cp >= 58 and cp <= 64) or
(cp >= 91 and cp <= 96) or (cp >= 123 and cp <= 126)):
return True
else:
return False
是否与
相同?def is_punctuation(char):
"""Checks whether `chars` is a punctuation character."""
# Treat all non-letter/number ASCII as punctuation.
# Characters such as "^", "$", and "`" are not in the Unicode
# punctuation class but treat them as punctuation anyways, for consistency.
cp = ord(char)
if (33 <= cp <= 47) or (58 <= cp <= 64) or (91 <= cp <= 96) or (123 <= cp <= 126):
return True
return False
有没有理由更喜欢_is_punctuation()
而不是is_punctuation()
,或者相反?
一个计算速度是否会比另一个计算速度更快?如果是这样,我们如何验证呢?使用dis.dis
吗?
P / S:我要问的是,因为找不到原因,为什么Google AI工程师会偏爱https://github.com/google-research/bert/blob/master/tokenization.py#L386上的原始_is_punctuation
实现
答案 0 :(得分:1)
否,它们在语义上是相同的。您还可以返回条件而不是使用if子句,因为它无论如何都会评估为布尔值:
return (33 <= cp <= 47) or (58 <= cp <= 64) or (91 <= cp <= 96) or (123 <= cp <= 126)
他们(Google AI工程师)可能不知道链式比较,或者他们wanted it to perform slightly better。