如何从压缩列表或循环创建字典

时间:2016-10-10 17:21:38

标签: python dictionary zip nltk

我正在尝试创建一个用于NLP的字典,其中最终输出应该类似于[{text: "blah blah blah"}, "positive"]

但是当我尝试创建"text: blah blah blah"的字典时,即使我正在处理列表,我得到的输出也只有一个条目。

以下是设置代码。

training_text = []
training_tag = []
with open("training.csv", encoding="ISO-8859-1") as csvfile:
    list_reader = csv.reader(csvfile)
    for row in list_reader:
        text=row[0]
        tag=row[1]
        training_text.append(text)
        training_tag.append(tag)

training_text_stem = []

for doc in training_text[1:]: #skip first row, which is header 
    #tokenize text
    tok = nltk.word_tokenize(doc)
    text = nltk.Text(tok)

    #normalize words
    words = [w.lower() for w in text if w.isalpha()]

    #build vocabulary
    vocab = sorted(set(words)) 

    #remove stopwords
    from nltk.corpus import stopwords
    stopwords = stopwords.words('english')
    vocab_redux = [w for w in vocab if w not in stopwords]

    #stemming to reduce topically similar words to their root
    from nltk.stem.porter import PorterStemmer
    p_stemmer = PorterStemmer()
    vocab_stem = [p_stemmer.stem(i) for i in vocab_redux]

    training_text_stem.append(vocab_stem)

这是它崩溃的地方。我已经尝试过2种方式,作为Dict-Zip理解,以及for循环。在这两种情况下,输出只是一个条目,而不是整个列表。

key = ['text']*len(training_text_stem)    
training_dictionary = dict(zip(training_text_stem, key))

The output

def makeadictionary(document):
    dictionarylist = []
    for doc in document:
        dictionarylist.append({'text': doc})
        return(dictionarylist)

makeadictionary(training_text_stem)

The output

0 个答案:

没有答案