Question

我正在使用Python（2.7.9）和NLTK（3.2.1）进行一些自然语言处理。我目前正在做的事情，每次我运行我的程序我都会在大型语料库上进行词性标注。

生成的标记语料库看起来像是这个版本的更大版本：

[('a', 'DT'), ('better', 'JJR'), ('widower', 'JJR'), ('than', 'IN'),
('my', 'PRP$'), ('father', 'NN'), ('.', '.'), ('Aunt', 'NNP'),
('Sybil', 'NNP'), ('had', 'VBD'), ('pink-rimmed', 'JJ'), ('azure',
'JJ'), ('eyes', 'NNS'), ('and', 'CC'), ('a', 'DT'), ('waxen', 'JJ'),
('complexion', 'NN'), ('.', '.'), ('She', 'PRP'), ('wrote', 'VBD'),
('poetry', 'NN'), ('.', '.'), ('She', 'PRP'), ('was', 'VBD'),
('poetically', 'RB'), ('superstitious', 'JJ')]

理想情况下，我只是将此列表保存到文件中，然后在每次运行程序时将文件读入变量。将列表保存到文件非常简单：

POScorpus = pos_tag(words)

#I convert this to a string so I can write it to a file.

POScorpus_string = str(POScorpus)

#I then write it to a file.

f = open('C:\Desktop\POScorpus.txt', 'w')

f.write(POScorpus_string)

f.close()

问题在于，当我将文件读入变量时，read()函数只将文件读取为字符串而不是列表。

我的问题很简单： 如何将文件作为列表而不是字符串来阅读？ 我想这是相对简单的，但我找不到任何文件有关如何操作的信息。

（道歉，如果这是偏离主题或愚蠢。）

Answer 1

可以使用eval()函数将字符串转换为列表。也就是说，这不是解决问题的最有效和内存友好的解决方案。

更好的选择是使用Python的pickle或cPickle模块。 “Pickling”是指将Python对象（例如，列表或字典）保存为字节流的过程，然后可以在以后快速将其卸载到变量中，而不会丢失或变形其对象类型。酸洗也被称为“序列化”和“编组”。

以下是一个例子：

#HOW TO PICKLE THE POS-TAGGED CORPUS

#Pickling involves saving a Python object as a file (without first converting
#it to a string).

#Let's pickle TaggedCorpus so we can use it efficiently later:

import cPickle                                 #imports fast pickle module (written in C)

f = open('C:\Desktop\TaggedCorpus.p', 'w')     #creates pickle file f
cPickle.dump(TaggedCorpus, f)                  #dumps data of TaggedCorpus object to f
f.close()

#To unpickle the object, simply load the file into a variable:

f = open('C:\Desktop\TaggedCorpus.p', 'r')     #opens the pickle file as read
TaggedCorpus = cPickle.load(f)                 #loads the content of f as TaggedCorpus
f.close()

Answer 2

您可以使用eval(your_string)将字符串转换为集合。

如何使用Python访问保存为文件的列表？

2 个答案: