Question

我正在尝试读取几个txt文件，计算单词，将所有内容放入字典中，然后在新文本文件中写入所述字典，但我遇到了for循环的问题。当我执行程序时，所有新文件都具有完全相同的内容，我不明白为什么。

以下是我现在所写的内容：

filename = ['file1.txt', 'file2.txt', 'file3.txt']
newfilename = ['newfile1.txt', 'newfile2.txt', 'newfile3.txt']

for l in filename :
    f = open(l, mode = 'r')
    dic = {}
    text = f.readlines()
    for t in text :
        word = sorted(t.split(), key = str.lower)
        for w in word :
            if w not in dic:
                dic[w] = 1
            else :
                dic[w] += 1
    dicsort = sorted(dic.items(), key = operator.itemgetter(1), reverse = True)
    for l2 in newfilename :
        f2 = open(l2, mode = 'w', encoding = 'utf-8')
        for k, v in dicsort :
            f2.write('\t'+ str(k) + '\t\t' + str(v)+'\n')

编辑：谢谢！我使用zip并使用with打开文件，现在它可以正常工作了！：）

这是最终的代码：

filename = ['file1.txt', 'file2.txt', 'file3.txt']
newfilename = ['newfile1.txt', 'newfile2.txt', 'newfile3.txt']

for l, l2 in zip(filename, newfilename) :
    with open(l, mode = 'r') as f:
        with open(l2, mode = 'w', encoding = 'utf-8') as f2 :
            dic = {}
            text = f.readlines()
            for t in text :
                word = sorted(t.split(), key = str.lower)
                for w in word :
                    if w not in dic:
                        dic[w] = 1
                    else :
                        dic[w] += 1
            dicsort = sorted(dic.items(), key = operator.itemgetter(1), reverse = True)
            for k, v in dicsort :
            f2.write('\t'+ str(k) + '\t\t' + str(v)+'\n')

Answer 1

注意：并不是一个完整的答案，但希望我可以防止一些初学者的错误。

从不打开<CopySources Sources ="@(FinalSourcesToCopyRemotely)" AdditionalSources="@(AdditionalSourcesToCopyRemotely)" ProjectDir="$(ProjectDir)" RemoteProjectDir="$(RemoteProjectDir)" RemoteTarget="$(ResolvedRemoteTarget)" IntermediateDir="$(IntDir)" RemoteProjectDirFile="$(RemoteProjectDirFile)" UpToDateFile="$(CopySourcesUpToDateFile)" LocalRemoteCopySources="$(LocalRemoteCopySources)"> <Output TaskParameter="ResolvedRemoteProjectDir" PropertyName="_ResolvedRemoteProjectDir" /> </CopySources>声明之外的文件。这非常糟糕。
您可以直接在文件对象上进行迭代。

因此，修订后的（但仍然不起作用）代码将是：

with

Answer 2

我将尝试解释当前代码对伪代码的作用：

for each input file:
  count its words
  for each output file:
    record the word count in the file

从这里开始，显而易见的是，对于每个输入文件，输出文件 all 都会被字数替换，所以在最后一个输入文件中，输出都将具有文件。

要解决此问题，您可以使用zip功能。

它的作用类似于：zip([1, 2, 3], [4, 5, 6]) == [(1, 4), (2, 5), (3, 6)]。

你可以像这样使用它：

for input_file, output_file in zip(input_files, output_files):
  count words in input
  write to output

然后单词计数将只写在一个文件中。

Answer 3

您正在反复覆盖您的文件。我发现使用名称而不是预定义创建新文件更容易。上面代码的工作版本是

filename = ['file1.txt', 'file2.txt', 'file3.txt']

for count, l in enumerate(filename):
   f = open(l, mode='r')
   dic = {}
   text = f.readlines()
   for t in text:
      word = sorted(t.split(), key=str.lower)
      for w in word:
         if w.lower() not in dic:
             dic[w.lower()] = 1
         else:
             dic[w.lower()] += 1
   dicsort = sorted(dic.items(), reverse=True)
   print dicsort
   f2 = open('newfile'+str(count+1)+'.txt', mode='w')
   for k, v in dicsort:
      f2.write('\t' + str(k) + '\t\t' + str(v)+'\n')

希望这有帮助！

Answer 4

试试这个。 filename中的每个文本文件都包含：这里有一些很棒的文字

import operator

filename = ['file1.txt', 'file2.txt', 'file3.txt']
newfilename = ['newfile1.txt', 'newfile2.txt', 'newfile3.txt']

for old, new in zip(filename, newfilename):
    dic = dict()    
    with open(old) as o, open(new, 'w') as n:
        words = o.read().split()
        for word in words:
            if word in dic:
                dic[word] += 1
            else:
                dic[word] = 1

        dicsort = sorted(dic.items(), key=operator.itemgetter(1), reverse=True)

        for k, v in dicsort:
            n.write('\t'+ k + '\t\t' + str(v) + '\n') # No need to call `str()` on `k` as `k` is already a string.

写入newfilename中每个文本文件的输出：

    great       2
    There       1
    is          1
    some        1
    text        1
    here        1

Answer 5

如果这不是作业，那么您可以使用collections.Counter类来简化。至于将文件内容拆分为单词，我更喜欢使用正则表达式来同时获取所有单词：

import collections
import re

filename = ['file1.txt', 'file2.txt', 'file3.txt']
newfilename = ['newfile1.txt', 'newfile2.txt', 'newfile3.txt']

pattern = re.compile(r'\w+')
for infilename, outfilename in zip(filename, newfilename):
    with open(infilename) as inf, open(outfilename, 'w') as outf:
        words = re.findall(pattern, inf.read().lower())
        counter = collections.Counter(words)
        for k, v in counter.most_common():
            outf.write('\t{}\t\t{}\n'.format(k, v))

在我的方法中，我使用re.findall()将所有小写单词放在一行中。请记住，此行包含3个单独的函数调用：.read()用于读取文件内容，.lower()用于将整个内容转换为小写，re.findall()用于提取所有单词。

之后，我使用collections.Counter类来计算这些单词，结果counter就像字典一样。此Counter对象包含.most_common()方法，该方法返回单词列表并按降序计数，非常方便。

最后要做的就是写出来。

总的来说，这种方法利用标准库在大约10行代码中完成工作。

如何将多个文件的内容复制到其他文件中

5 个答案: