Question

我试图计算标记化文本中的某些表达式。我的代码是：

tokens = nltk.word_tokenize(raw)
print(tokens.count(r"<cash><flow>"))

＆＃39;令牌＆＃39;是一个标记化文本列表（部分显示如下）。但是这里的正则表达式不起作用，输出显示0现金流出现，这是不正确的。我没有收到任何错误消息。如果我只计算现金＆＃39;它可以正常工作。

'that', 'produces', 'cash', 'flow', 'from', 'operations', ',', 'none', 'of', 'which', 'are', 'currently', 'planned', ',', 'the', 'cash', 'flows', 'that', 'could', 'result', 'from'

有谁知道问题是什么？

Answer 1

你不需要正则表达式。
只需在标记中找到匹配的关键字并计算元素。

示例：

import re

tokens = ['that','produces','cash','flow','from','operations','with','cash']
string = ' '.join(tokens)

pattern = re.compile(r'\b(cash|flow)\b', re.IGNORECASE)

keyword_matches = re.findall(pattern, string)
count_keyword_matches = len(keyword_matches)
print(keyword_matches)
print(count_keyword_matches)

count_keywords_in_tokens返回2，因为在列表中找到了这两个单词。

要以正则表达式方式执行此操作，您需要一个字符串来根据正则表达式模式查找匹配项在下面的示例中，2个关键字由OR（管道）

分隔

  componentWillReceiveProps() {
    const routeChangeListener = browserHistory.listenBefore(location => {
      console.log(location);
      routeChangeListener();
      return 'Are you sure you want to leave the page without saving?';
    });
  }

count_keyword_matches返回3，因为有3个匹配。

在列表的count（）内的正则表达式不起作用

1 个答案: