Question

我有短字符串（推文），我必须从文本中提取所有提及的实例，并返回这些实例的列表，包括重复。

extract_mentions（'。@ AndreaTantaros-supersleuth！你是一位真正的新闻专业人士。保持良好的工作！#MakeAmericaGreatAgain'）       [AndreaTantaros]

如何在“@”之后删除第一个标点符号后删除所有文本？（在这种情况下，它将是' - '）注意，标点符号可以改变。请不要使用正则表达式。

我使用了以下内容：

tweet_list = tweet.split()
    mention_list = []
    for word in tweet_list:
        if '@' in word:
            x = word.index('@')
            y = word[x+1:len(word)]
            if y.isalnum() == False:
                y = word[x+1:-1]
                mention_list.append(y)
            else:                
                mention_list.append(y)
    return mention_list

这仅适用于具有一个额外字符的实例

Answer 1

使用string.punctuation模块获取所有标点字符。

在标点符号时删除第一个字符（否则答案将始终为空字符串）。然后找到第一个标点符号。

这使用2个具有相反条件的循环和set以获得更好的速度。

z =".@AndreaTantaros-supersleuth! You are a true journalistic professional. Keep up the great work! #MakeAmericaGreatAgain') [AndreaTantaros]"

import string

# skip leading punctuation: find position of first non-punctuation

spun=set(string.punctuation)  # faster if searched from a set

start_pos = 0
while z[start_pos] in spun:
    start_pos+=1

end_pos = start_pos
while z[end_pos] not in spun:
    end_pos+=1

print(z[start_pos:end_pos])

Answer 2

import string

def extract_mentions(s, delimeters = string.punctuation + string.whitespace):
  mentions = []
  begin = s.find('@')
  while begin >= 0:
    end = begin + 1
    while end < len(s) and s[end] not in delimeters:
      end += 1
    mentions.append(s[begin+1:end])
    begin = s.find('@', end)
  return mentions


>>> print(extract_mentions('.@AndreaTantaros-supersleuth! You are a true journalistic professional. Keep up the great work! #MakeAmericaGreatAgain'))
['AndreaTantaros']

Answer 3

只需使用regexp匹配并提取部分文本。

如何在标点符号/空格的第一个实例后删除所有字符？

3 个答案: