Question

我正在用python接收一条推文流，想提取最后一个单词或知道在哪里引用它。

例如

NC不喜欢一起工作www.linktowtweet.org

找回

 together

Answer 1

我不熟悉tweepy，所以我假设您将数据存储在python字符串中，所以也许会有更好的答案。

但是，给定python中的字符串，提取最后一个单词很简单。

解决方案1 

使用@IBOutlet weak var userInput: NSTextField! @IBOutlet weak var result: NSTextField! @IBOutlet weak var userMenuSelection: NSPopUpButton! override func viewDidLoad() { super.viewDidLoad() userMenuSelection.removeAllItems() userMenuSelection.addItems(withTitles: ["50", "20", "10"]) } @IBAction func pushButtonforResult(_ sender: Any) { /* Not sure how to take the selected userMenuSelection and multiply it by the users input to get my result in a float. E.g. look below. This of course does not work because the menu items are strings. */ result.floatValue = userMenuSelection * userInput.floatValue }。这里的想法是在最后一个单词之前找到空格。这是一个例子。

str.rfind(' ')

注意：如果给出的字符串中没有单词，则text = "NC don’t like working together" text = text.rstrip() # To any spaces at the end, that would otherwise confuse the algorithm. last_word = text[text.rfind(' ')+1:] # Output every character *after* the space. print(last_word)将为空白字符串。

现在这假定所有单词都用空格分隔。要处理换行符和空格，请使用last_word将它们转换为字符串。 python中的空格为str.replace，但我认为在Twitter消息中只会找到换行符和制表符。

另请参阅：string.whitespace

因此将有一个完整的示例（包装为函数）

\t\n\x0b\x0c\r

对于基本解析，这仍然是最好的情况。对于较大的问题，有一些更好的方法。

解决方案2

正则表达式

这些是在python中处理字符串的方法，它更加灵活。经常被称为REGEX的语言使用自己的语言来指定文本的一部分。

例如，def last_word(text): text = text.replace('\n', ' ') # Replace newlines with spaces. text = text.replace('\t', ' ') # Replace tabs with spaces. text = text.rstrip(' ') # Remove trailing spaces. return text[text.rfind(' ')+1:] print(last_word("NC don’t like working together")) # Outputs "together".指定字符串中的最后一个单词。

这又是一个更长的解释了。

.*\s(\S+)

因此，在python中，您将按以下方式使用它。

.*               # Match as many characters as possible.
\s               # Until a whitespace ("\t\n\x0b\x0c\r ")
(                # Remember the next section for the answer.
\S+              # Match a ~word~ (not whitespace) as possible.
)                # End saved section.

现在，即使此方法不太明显，它也有很多优点。首先，它更具可定制性。如果您想匹配最后一个单词而不是链接，则正则表达式import re # Import the REGEX library. # Compile the code, (DOTALL makes . match \n). LAST_WORD_PATTERN = re.compile(r".*\s(\S+)", re.DOTALL) def last_word(text): m = LAST_WORD_PATTERN.match(text) if not m: # If there was not a last word to this text. return '' return m.group(1) # Otherwise return the last word. print(last_word("NC don’t like working together")) # Outputs "together".将匹配最后一个单词，但是如果这是最后一个，则忽略链接。

示例：

r".*\s([^.:\s]+(?!\.\S|://))\b"

此方法的第二个优点是速度。

您可以在Try it online!此处，正则表达式方法几乎与字符串操作一样快，即使在某些情况下不快。（实际上，我发现正则表达式在我的机器上比演示中的速度更快地执行.2 usec。）

无论哪种方式，即使在简单的情况下，regex的执行速度都非常快，毫无疑问，regex比python中实现的任何更复杂的字符串算法都要快。因此，使用正则表达式也可以加快代码的速度。

编辑更改了避免

的正则表达式的网址

import re # Import the REGEX library.

# Compile the code, (DOTALL makes . match \n).
LAST_WORD_PATTERN = re.compile(r".*\s([^.:\s]+(?!\.\S|://))\b", re.DOTALL)

def last_word(text):
    m = LAST_WORD_PATTERN.match(text)
    if not m: # If there was not a last word to this text.
        return ''
    return m.group(1) # Otherwise return the last word.

print(last_word("NC don’t like working together www.linktowtweet.org")) # Outputs "together".

到

re.compile(r".*\s([^.\s]+(?!\.\S))\b", re.DOTALL)

因此，调用re.compile(r".*\s([^.:\s]+(?!\.\S|://))\b", re.DOTALL)会返回last_word("NC don’t like working together http://www.linktowtweet.org")而不是together。

要了解此正则表达式的工作原理，请查看https://regex101.com/r/sdwpqB/2。

Answer 2

简单，因此，如果您的文字是：

所以你去了！现在，您将得到最后的单词“ together”。

在tweepy tweet响应python中找到最后一个单词

2 个答案: