向空嵌套列表添加新值

时间:2016-10-16 19:54:50

标签: python list assign

这与How to append to the end of an empty list?有关,但我还没有足够的声誉在那里发表评论,所以我在这里发布了一个新问题。

我需要将术语附加到列表的空列表中。我从:

开始
Talks[eachFilename][TermVectors]=
      [['paragraph','1','text'],
       ['paragraph','2','text'],
       ['paragraph','3','text']]

我想以

结束
Talks[eachFilename][SomeTermsRemoved]=
      [['paragraph','text'],
       ['paragraph','2'],
       ['paragraph']]

Talks[eachFilename][SomeTermsRemoved]开始为空。我无法指定我想要的内容:

Talks[eachFilename][SomeTermsRemoved][0][0]='paragraph'
Talks[eachFilename][SomeTermsRemoved][0][1]='text'
Talks[eachFilename][SomeTermsRemoved][1][0]='paragraph'

等......(IndexError:列表索引超出范围)。如果我强制填充字符串,然后尝试更改它,我得到一个字符串是不可变的错误。

那么,如何指定我希望Talks[eachFilename][SomeTermsRemoved][0]['paragraph','text']Talks[eachFilename][SomeTermsRemoved][1]['paragraph','2']等?

.append有效,但只生成一个长列,而不是一组列表。

更具体地说,我有一些在dict中初始化的列表

Talks = {}
Talks[eachFilename]= {}
Talks[eachFilename]['StartingText']=[]
Talks[eachFilename]['TermVectors']=[]
Talks[eachFilename]['TermVectorsNoStops']=[]

eachFilename从文本文件列表中填充,例如:

Talks[eachFilename]=['filename1','filename2']

StartingText有几行长文本(单个段落)

Talks[filename1][StartingText]=['This is paragraph one','paragraph two']

TermVectors由NLTK包填充,其中包含一系列术语,仍然在原始段落中分组:

Talks[filename1][TermVectors]=
     [['This','is','paragraph','one'],
      ['paragraph','two']]

我想进一步操纵TermVectors,但保留原始段落列表结构。这将创建一个每行包含1个术语的列表:

for eachFilename in Talks:
    for eachTerm in range( 0, len( Talks[eachFilename]['TermVectors'] ) ):
        for term in Talks[eachFilename]['TermVectors'][ eachTerm ]:
            if unicode(term) not in stop_words:
                Talks[eachFilename]['TermVectorsNoStops'].append( term )

结果(我丢失了段落结构):

Talks[filename1][TermVectorsNoStops]=
     [['This'],
      ['is'],
      ['paragraph'],
      ['one'],
      ['paragraph'],
      ['two']]

2 个答案:

答案 0 :(得分:0)

您报告的错误(字符串不可变?)除非您的列表实际上不是空的但已经填充了字符串,否则没有任何意义。无论如何,如果你从一个空列表开始,那么填充它的最简单方法是附加:

>>> talks = {}
>>> talks['each_file_name'] = {}
>>> talks['each_file_name']['terms_removed'] = []
>>> talks['each_file_name']['terms_removed'].append(['paragraph','text'])
>>> talks['each_file_name']['terms_removed'].append(['paragraph','2'])
>>> talks['each_file_name']['terms_removed'].append(['paragraph'])
>>> talks
{'each_file_name': {'terms_removed': [['paragraph', 'text'], ['paragraph', '2'], ['paragraph']]}}
>>> from pprint import pprint
>>> pprint(talks)
{'each_file_name': {'terms_removed': [['paragraph', 'text'],
                                      ['paragraph', '2'],
                                      ['paragraph']]}}

如果您有一个空列表并尝试使用索引分配给它,则会抛出错误:

>>> empty_list = []
>>> empty_list[0] = 10
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range

顺便说一句,这样的代码:

for eachFilename in Talks:
    for eachTerm in range( 0, len( Talks[eachFilename]['TermVectors'] ) ):
        for term in Talks[eachFilename]['TermVectors'][ eachTerm ]:
            if unicode(term) not in stop_words:
                Talks[eachFilename]['TermVectorsNoStops'].append( term )

与Python风格相差甚远。不要使用 camelCase ,请使用 snake_case 。不要把变量大都化。此外,在您的中级for循环中,您使用for eachTerm in range(0, len(Talks[eachFilename]['TermVectors'],但eachTermint,因此使用标准i {{1}更有意义}或j。甚至k

无论如何,没有理由为什么代码应该改变这个:

idx

进入这个:

Talks[filename1][TermVectors] =
     [['This','is','paragraph','one'],
      ['paragraph','two']] 

这是一个可重复的例子(我已经为你做了这件事,但你在发布一个问题之前应该自己做这件事):

Talks[filename1][TermVectors] =
     [['This'],
      ['is'],
      ['paragraph'],
      ['one'],
      ['paragraph'],
      ['two']]

更多的pythonic方法将如下所示:

>>> pprint(talks)
{'file1': {'no_stops': [],
           'term_vectors': [['This', 'is', 'paragraph', 'one'],
                            ['paragraph', 'two']]},
 'file2': {'no_stops': [],
           'term_vectors': [['This', 'is', 'paragraph', 'three'],
                            ['paragraph', 'four']]}}
>>> for file in talks:
...   for i in range(len(talks[file]['term_vectors'])):
...     for term in talks[file]['term_vectors'][i]:
...       if term not in stop_words:
...         talks[file]['no_stops'].append(term)
... 
>>> pprint(file)
'file2'
>>> pprint(talks)
{'file1': {'no_stops': ['This', 'paragraph', 'one', 'paragraph'],
           'term_vectors': [['This', 'is', 'paragraph', 'one'],
                            ['paragraph', 'two']]},
 'file2': {'no_stops': ['This', 'paragraph', 'paragraph', 'four'],
           'term_vectors': [['This', 'is', 'paragraph', 'three'],
                            ['paragraph', 'four']]}}
>>> 

答案 1 :(得分:0)

一些持续的实验,以及评论让我走向解决方案。我没有附加生成单个长列表的每个单独的术语,而是将这些术语累积到一个列表中,然后附加每个列表,如下所示:

for eachFilename in Talks:
    for eachTerm in range( 0, len( Talks[eachFilename]['TermVectors'] ) ):
        term_list = [ ]
        for term in Talks[eachFilename]['TermVectors'][ eachTerm ]:
            if unicode(term) not in stop_words:
                term_list.append(term)
        Talks[eachFilename]['TermVectorsNoStops'].append( term )

谢谢大家!