Question

在编写33 <= cp <= 47与cp >= 33 and cp <= 47之类的东西时有区别吗？

更具体地说，如果有一个函数可以做到：

def _is_punctuation(char):
  """Checks whether `chars` is a punctuation character."""
  cp = ord(char)
  if ((cp >= 33 and cp <= 47) or (cp >= 58 and cp <= 64) or
      (cp >= 91 and cp <= 96) or (cp >= 123 and cp <= 126)):
    return True
  else:
    return False

是否与

相同？

def is_punctuation(char):
    """Checks whether `chars` is a punctuation character."""
    # Treat all non-letter/number ASCII as punctuation.
    # Characters such as "^", "$", and "`" are not in the Unicode
    # punctuation class but treat them as punctuation anyways, for consistency.
    cp = ord(char)
    if (33 <= cp <= 47) or (58 <= cp <= 64) or (91 <= cp <= 96) or (123 <= cp <= 126):
        return True
    return False

有没有理由更喜欢_is_punctuation()而不是is_punctuation()，或者相反？

一个计算速度是否会比另一个计算速度更快？如果是这样，我们如何验证呢？使用dis.dis吗？

P / S：我要问的是，因为找不到原因，为什么Google AI工程师会偏爱https://github.com/google-research/bert/blob/master/tokenization.py#L386上的原始_is_punctuation实现

Answer 1

否，它们在语义上是相同的。您还可以返回条件而不是使用if子句，因为它无论如何都会评估为布尔值：

return (33 <= cp <= 47) or (58 <= cp <= 64) or (91 <= cp <= 96) or (123 <= cp <= 126)

他们（Google AI工程师）可能不知道链式比较，或者他们wanted it to perform slightly better。

Python中检查字符序号的不等式

1 个答案: