Question

我正在尝试在字符串中查找大写首字母缩略词。例如，如果输入是“我需要尽快见到你，因为 YOLO，你知道”应该返回 ["ASAP", "YOLO"]。

#!/usr/bin/env python3

import string


def acronyms(s):

    s.translate(string.punctuation)
    for i, x in enumerate(s):
        while x.upper():
            print(x)
            i += 1


def main():
    print(acronyms("""I need to see you ASAP, because YOLO, you know."""))


if __name__ == "__main__":
    main()

我试图去掉标点符号，然后循环遍历字符串，当它是大写的时候将字母打印出来。这导致了无限循环。我想使用字符串操作来解决这个问题，所以没有 RegEx

编辑：

为提高效率而删除标点符号的变化

来自：

exclude = set(string.punctuation)
        s = "".join(ch for ch in s if ch not in exclude)

致：

s.translate(string.punctuation)

Answer 1

我想指出几点。一，你最终得到了一个挂起的程序，因为你有一个 while True 而不是一个 break。然后，当您执行 enumerate 时，您有点使 n+=1 变得毫无意义。

for i, x in enumerate(s):
    n+=1

这一切都可以轻松简化，无需 enumerate needed。

def acronyms(s):

    exclude = set(string.punctuation)
    s = "".join(ch for ch in s if ch not in exclude)
    acro = [x for x in s.split() if x.isupper()]
    return acro

输出

['I', 'ASAP', 'YOLO']

遗憾的是，我们确实有一个额外的 I，它恰好不是首字母缩略词，因此一种解决方法是确保 x 在附加之前绝不是一个字母。

acro = [x for x in s.split() if x.isupper() and len(x) != 1]

Answer 2

您的 while 循环遍历第一个字符，但永远不会跳到下一个字符。

您还想过滤掉“I”，因为单个字母通常不被归类为首字母缩略词。

string.isupper() 函数检查整个字符串而不是单个字符，因此我建议您使用它。它看起来像这样：

def acronyms(s):
    words = s.split()
    acronyms = []
    for word in words:
        if word.isupper() and len(word) > 1:
            acronyms.append(word)
    return acronyms

Answer 3

我强烈推荐使用 nltk 作为其出色的标记化包，它可以出色地处理边缘情况和标点符号。

对于将首字母缩写词定义为的简化方法：

所有字符均按字母顺序排列
所有字符都是大写

以下内容就足够了：

from nltk.tokenize import word_tokenize

def get_acronyms(text):
    return [
        token for token in word_tokenize(text)
        if token.isupper()
    ]

Answer 4

这里应该可以：

def acronyms(x):
  ans = []
  y = x.split(" ")
  for i in y:
    if i.isupper():
      ans += [i]
  return ans

isupper() 返回 True 只要没有小写，即使有标点符号

在字符串中查找大写首字母缩略词

4 个答案: