Question

我能够获得this book的预期输出，第4页“搜索文本”。当我试图将它应用到我的案例时，我得到No matches这不是我预期的输出。我想我不是在适当的级别（字而不是字符）进行标记，但我不确定如何纠正它。有什么建议？我想要的输出是每个连字符与其周围的上下文垂直排列。

>>> f = open('hyphen.txt')
>>> raw = f.read()
>>> import nltk
>>> tokens = nltk.word_tokenize(raw)
>>> text = nltk.Text(tokens)
>>> text.concordance("-")
No matches
>>> text
<Text: Fog Air-Flow Switch stuck off ? Bubble Tower...>

（Python 3.4.3）

修改

我认为我接近使用正则表达式，但我不知道如何删除'NoneType'对象。有什么建议吗？

我想看的输出看起来像这样：

                 Fog Air-Flow Switch stuck off?
      Bubble Tower Check-Valve stuck closed?
           Chamber Drain-Trap broken, dry, or missing?
         Chamber Exhaust-Vent blocked or restricted?
 etc.

如果上下文比带有连字符的句子宽，那也没关系 - 对我来说重要的是连字符与其周围的上下文垂直排列。

Answer 1

需要稍微更改一下代码。

import nltk
f = open("/path/to/file") //path of the file
raw = f.read()
text = nltk.Text(raw)
text.concordance("-")

所需输出：

使用一致性来查找带连字符的单词

1 个答案: