Question

我想使用列表list_of_occurences中的正确项填充grundformen。

我的for循环不能按预期工作。它不会从头开始重新启动，只会读取读取器中的行一次。因此它不会完全填写清单。

这是它打印的内容（你可以看到缺少某些东西的部分 - 因为它没有从列表的开头开始搜索 - ）：

# List_of_occurrences (1 line - wrapped for easier reading)
[['NN', 1328, ('Ziel',)], ['ART', 771, ('der',)], 
 ['$.', 732, ('_',)], ['VVFIN', 682, ('schlagen',)], 
 ['PPER', 592, ('sie',)], ['$,', 561, ('_',)], 
 ['ADV', 525, ('So',)], ['APPR', 507, ('in',)], 
 ['NE', 433, ('Johanna',)], ['$(', 363, ('_',)], 
 ['VAFIN', 334, ('haben',)], ['ADJA', 307, ('tragisch',)], 
 ['ADJD', 278, ('recht',)], ['KON', 228, ('Doch',)], 
 ['VVPP', 194, ('reichen',)], ['VVINF', 161, ('stören',)], 
 ['KOUS', 151, ('Während',)], ['PPOSAT', 120, ('ihr',)], 
 ['PTKVZ', 104, ('weiter',)], ['PRF', 98, ('sich',)], 
 ['APPRART', 90, ('zu',)], ['PTKNEG', 87, ('nicht',)], 
 ['VMFIN', 76, ('sollen',)], ['PIAT', 66, ('kein',)], 
 ['PIS', 65, ('etwas',)], ['PTKZU', 52, ('zu',)], 
 ['PRELS', 51, ('wer',)], ['PROAV', 42, ('dabei',)],  
 ['PDS', 38, ('jener',)], ['PDAT', 37, ('dieser',)], 
 ['PWAV', 30, ('wie',)], ['PWS', 26, ('Was',)], 
 ['CARD', 24, ('drei',)], ['KOKOM', 21, ('wie',)], 
 ['VAINF', 18, ('werden',)], ['KOUI', 15, ('um',)], 
 ['VMINF', 10, ('können',)], ['VVIZU', 10, ('aufklären',)], 
 ['VAPP', 10], ['PTKA', 6], ['PTKANT', 6], ['PWAT', 4], 
 ['VVIMP', 4], ['PRELAT', 4], ['APZR', 3], ['APPO', 2], 
 ['FM', 1]]

# Grundformen (1 line, wrapped for reading)
['Ziel', 'der', '_', 'schlagen', 'sie', '_', 'So', 'in', 'Johanna',
 '_', 'haben', 'tragisch', 'recht', 'Doch', 'reichen', 'stören', 
 'Während', 'ihr', 'weiter', 'sich', 'zu', 'nicht', 'sollen', 'kein', 
 'etwas', 'zu', 'wer', 'dabei', 'jener', 'dieser', 'wie', 'Was', 
 'drei', 'wie', 'werden', 'um', 'können', 'aufklären']

occurences = collections.Counter()

with open("material-2.csv", mode='r', newline='', encoding="utf-8") as material:
    reader = csv.reader(material, delimiter='\t', quotechar="\t")
    for line in reader:
        if line:
            occurences[line[5]] += 1
        else:
            pass

list_of_occurences = [list(elem) for elem in occurences.most_common()]

grundformen = []
with open('material-2.csv', mode='r', newline='', encoding="utf-8") as material:
    reader = csv.reader(material, delimiter='\t', quotechar="\t")
    for elem in list_of_occurences:
        for row in reader:
            if row != [] and row[5] == elem[0]:
                grundformen.append(row[2])
                break

iterator = 0
for elem in grundformen:
    list_of_occurences[iterator].insert(2, elem)
    iterator = iterator + 1
    pass

print(list_of_occurences)
print(grundformen)

整个输入文件：https://www.dropbox.com/sh/xyktjk4ycm8x6v0/AACou438_eEWx-ZYmByBiqp_a/material-2.csv?dl=0

我的输入文件的一部分：

1 Als Als _ _ KOUS _ _ 6 6 CP CP _ _ 2 es es _ _ PPER _ 3 | Nom | Sg | Neut 6 6 SB SB _ _ 3 zu zu _ _ PTKA _ _ 4 4 MO MO _ _ 4 schneien schneien _ _ ADJD _ Comp | Dat | Sg | Fem 5 5 MO MO _ _ 5aufgehörtaufhören__ VVPP _ Psp 6 6 OC OC _ _ 6 hatte haben _ _ VAFIN _ 3 | Sg |过去| Ind 8 8 MO MO _ _ 7，_ _ _ $，_ _ 8 8 PUNC PUNC _ _ 8verließverlassen_ _ VVFIN _ 3 | Sg |过去| Ind 0 0 ROOT ROOT _ _ 9 Johanna Johanna _ _ NE _ Nom | Sg | Masc 8 8 SB SB _ _ 10 von von _ _ APPR _ _ 5 5 SBP SBP _ _ 11 Rotenhoff Rotenhoff _ _ NE _ Dat | Sg | Neut 10 10 NK NK _ _ 12，_ _ _ $，_ _ 8 8 PUNC PUNC _ _ 13 ohne ohne _ _ KOUI _ _ 18 18 CP CP _ _ 14 ein ein _ _ ART _ Nom | Sg | Neut 16 16 NK NK _ _ 15 rechtes recht _ _ ADJA _ Pos | Nom | Sg | Neut 16 16 NK NK _ _ 16 Ziel Ziel _ _ NN _ Nom | Sg | Neut 18 18 OA OA _ _ 17 zu zu _ _ PTKZU _ _ 18 18 PM PM _ _ 18 haben haben _ _ VAINF _ Inf 8 8 MO MO _ _ 19，_ _ _ $，_ _ 18 18 PUNC PUNC _ _ 20 das der _ _ ART _ Nom | Sg | Neut 21 21 NK NK _ _ 21 Gutshaus Gutshaus _ _ NN _ Nom | Sg | Neut 16 16 APP APP _ _ 22。 _ _ _ $。 _ _ 8 8 PUNC PUNC _ _

如何更改循环，以便它可以填充所有内容？

Answer 1

reader = csv.reader(material, delimiter='\t', quotechar="\t")

设置与分隔符相同的quotechar看起来很奇怪。 CSV阅读器可能会混淆，并将所有选项卡（\t）作为分隔符，或将它们全部解释为引号。

Answer 2

您在阅读csv数据时遇到了问题。

此处数据被读入list并且可以通过第二个循环而不是打开另一个file-object，但您甚至不需要遍历csv数据两次：

import csv
import collections

occurences = collections.Counter()
grundformen = collections.defaultdict(list)

with open("material-2.csv", mode='r', newline='', encoding="utf-8") as material:
    reader = [ln for ln in csv.reader(material, delimiter='\t', quotechar="\t") if ln]
    for line in reader:
        occurences[line[5]] += 1
        grundformen[line[5]].append(line[2])
    list_of_occurences = list(map(list, occurences.most_common()))
    for elem in list_of_occurences:
        elem.append(grundformen[elem[0]][0])

print(occurences)

通过从list数据中提取csv，您可以调用break语句，但仍然可以在list的头部开始为你的下一个循环。当您在csv.reader上方时，这是一个iterator，所以即使在致电break时，您也会从中断的地方开始，直到数据耗尽为止。

我的for-Loop没有按预期工作

2 个答案: