如何使用Python 3将文本文件(单个列表)的内容拆分/解析并写入多个列表?

时间:2018-10-25 20:14:53

标签: python string python-3.x list file

假设我在Python 3中读取了一个文件(.txt文件)。接下来,我需要将单个列表的内容解析为与原始文件的空白和\n断句符有关的多个列表。

我需要分别编写和保存包含列表列表的新文件。

完成此操作后,我应该在目录中包含2个文件,一个文件包含entire text in a single list,另一个文件包含containing only the list of lists

我已经尝试过但尚未成功,感谢您的帮助。

我要敲击的代码如下。

import nltk, re
import string
from collections import Counter
from string import punctuation
from nltk.tokenize import TweetTokenizer, sent_tokenize, word_tokenize
from nltk.corpus import gutenberg, stopwords
from nltk.stem import WordNetLemmatizer

def remove_punctuation(from_text):
    table = str.maketrans('', '', string.punctuation)
    stripped = [w.translate(table) for w in from_text]
    return stripped

def preprocessing():
    with open("I:\\(2018 - 2019)\\College Desktop\\Pen Drive 8 GB\\PDF\\Code\\Books Handbooks\\Books Handbooks Text\\b1.txt", encoding="utf-8") as f:
         tokens_sentences = sent_tokenize(f.read())
         tokens = [[word.lower() for word in line.split()] for line in tokens_sentences]
         global stripped_tokens
         stripped_tokens = [remove_punctuation(i) for i in tokens]
         sw = (stopwords.words('english'))
    filter_set = [[token for token in sentence if (token.lower() not in sw and token.isalnum() and token.isalpha() and re.findall(r"[^_ .'\"-[A-Za-z]]+", token))] for sentence in stripped_tokens]
    lemma = WordNetLemmatizer()
    lem = []
    for w in filter_set:
        lem.append([wi for wi in map(lemma.lemmatize, w)])
    return lem
result = preprocessing()
with open('I:\\(2018 - 2019)\\College Desktop\\Pen Drive 8 GB\\PDF\\Code\\Books Handbooks\\Books Handbooks Text\\b1_list.txt', "w", encoding="utf-8")  as f1:
    for e in result[:3]:  
        f1.write(str(e))
preprocessing() 

我很沮丧,因为程序正确执行。没有错误,但输出不是所需的。例如,在上面的代码中,我希望在新文件中写入前3个句子。

但是当我打开新文件时,它显示3个空列表,有点像[] [] []。为什么会这样?

0 个答案:

没有答案