Question

我已经做了相当多的潜伏在SO和相当多的搜索和阅读，但我还必须承认自己是一般的编程相对noob。我正在努力学习，所以我一直在玩Python的NLTK。在下面的脚本中，我可以让一切工作，除了它只写出多屏幕输出的第一个屏幕，至少我正在考虑它。

这是脚本：

#! /usr/bin/env python

import nltk

# First we have to open and read the file:

thefile = open('all_no_id.txt')
raw = thefile.read()

# Second we have to process it with nltk functions to do what we want

tokens = nltk.wordpunct_tokenize(raw)
text = nltk.Text(tokens)

# Now we can actually do stuff with it:

concord = text.concordance("cultural")

# Now to save this to a file

fileconcord = open('ccord-cultural.txt', 'w')
fileconcord.writelines(concord)
fileconcord.close()

这是输出文件的开头：

Building index...
Displaying 25 of 530 matches:
y .   The Baobab Tree : Stories of Cultural Continuity The continuity evident 
 regardless of ethnicity , and the cultural legacy of Africa as well . This Af

我在这里缺少什么来将整个530匹配写入文件？

Answer 1

根据{{3}}，

text.concordance(self, word, width=79, lines=25)似乎还有其他参数。

我认为无法提取索引索引的大小，但manual似乎有这个部分：lines = min(lines, len(offsets))，因此您只需将sys.maxint作为最后一个参数传递：

concord = text.concordance("cultural", 75, sys.maxint)

<强>加了：

现在看着原始代码，我看不出它以前的工作方式。 text.concordance不会返回任何内容，而是使用stdout将所有内容输出到print。因此，easy选项可以将stdout重定向到您的文件，如下所示：

import sys

....

# Open the file
fileconcord = open('ccord-cultural.txt', 'w')
# Save old stdout stream
tmpout = sys.stdout
# Redirect all "print" calls to that file
sys.stdout = fileconcord
# Init the method
text.concordance("cultural", 200, sys.maxint)
# Close file
fileconcord.close()
# Reset stdout in case you need something else to print
sys.stdout = tmpout

另一种选择是直接使用相应的类并省略Text包装器。只需复制concordance printing code中的位并将它们与来自here的位组合起来即可完成。

Answer 2

更新

我从ntlk用户组中找到了这个write text.concordance output to a file Options 。它是从2010年开始，并指出：

Text类的文档说：“旨在支持初步探索文本（通过交互式控制台）。 ... 如果你希望编写一个程序，利用这些分析，然后你应该绕过Text类，并使用适当的分析直接改为函数或类。“

如果此后包中没有任何变化，这可能是您问题的根源。

---以前---

我没有看到使用writelines()写入文件时出现问题：

file.writelines（序列）

将一串字符串写入文件。序列可以是任何序列   可迭代对象生成字符串，通常是字符串列表。那里   没有回报价值。（该名称旨在匹配readlines（）;    writelines（）不添加行分隔符。）

注意斜体部分，您是否检查了不同编辑器中的输出文件？也许数据存在，但由于缺少行结束分离而无法正确呈现？

您确定此部件是否正在生成您要输出的数据？

 concord = text.concordance("cultural")

我不熟悉nltk，所以我只是要求消除可能的问题来源。

Python：如何捕获输出到文本文件？（现在只捕获530条线路中的25条）

2 个答案:

Python：如何捕获输出到文本文件？ （现在只捕获530条线路中的25条）

2 个答案:

Python：如何捕获输出到文本文件？（现在只捕获530条线路中的25条）