Question

“test.txt”

中有两个sentnecs

sentence1 =句子是由一个或多个单词组成的语法单位。

sentence2 =句子也可以单独用正字法定义。

count_line = 0
for line in open('C:/Users/Desktop/test.txt'):
    count_line = count_line +1
    fields = line.rstrip('\n').split('\t')
    ##print count_line, fields
    file = open('C:/Users/Desktop/test_words.txt', 'w+')
    count_word = 0
    for words in fields:
        wordsplit = words.split()
        for word in wordsplit:
             count_word = count_word + 1
             print count_word, word
             file.write(str(count_word) + " " + word + '\n')
        file.close()

我在“test_words.txt”中的结果只显示了第二句中的单词：

1 A 
2 sentence
3 can
4 also
5 be
6 defined
7 in
8 orthographic
9 terms
10 alone.

如何编写第一句中的单词并在第二句“test_words.txt”中的单词后跟？

有什么建议吗？

Answer 1

在您的代码中，您多次打开和关闭输出文件，导致代码覆盖您从第一句中写入的内容。简单的解决方案是只打开一次并且只关闭一次。

count_line = 0
# Open outside the loop
file = open('C:/Users/Desktop/test_words.txt', 'w+')
for line in open('C:/Users/Desktop/test.txt'):
    count_line = count_line +1
    fields = line.rstrip('\n').split('\t')
    ##print count_line, fields
    count_word = 0
    for words in fields:
        wordsplit = words.split()
        for word in wordsplit:
            count_word = count_word + 1
            print count_word, word
            file.write(str(count_word) + " " + word + '\n')
# Also close outside the loop
file.close()

Answer 2

发生这种情况的原因是因为当您第二次打开文件时，您不会保留其中的原始文本。当你打开一个文件并用Python写入它时，你基本上会覆盖它的内容，除非你将它们存储在变量中并重新编写它们。

试试这段代码：

count_line = 0
for n, line in enumerate(open('test.txt')):
    count_line = count_line +1
    fields = line.rstrip('\n').split('\t')
    ##print count_line, fields
    already_text = open('test_words.txt').read() if n > 0 else ''
    file = open('test_words.txt', 'w+')
    count_word = 0
    file.write(already_text)
    for words in fields:
        wordsplit = words.split()
        for word in wordsplit:
             count_word = count_word + 1
             print count_word, word
             file.write(str(count_word) + " " + word + '\n')
        file.close()

这是我运行时的输出：

1 A
2 sentence
3 is
4 a
5 grammatical
6 unit
7 consisting
8 of
9 one
10 or
11 more
12 words.
1 A
2 sentence
3 can
4 also
5 be
6 defined
7 in
8 orthographic
9 terms
10 alone.

这是没有enumerate()的代码：

count_line = 0
n = 0
for line in open('test.txt'):
    count_line = count_line +1
    fields = line.rstrip('\n').split('\t')
    ##print count_line, fields
    already_text = open('test_words.txt').read() if n > 0 else ''
    file = open('test_words.txt', 'w+')
    count_word = 0
    file.write(already_text)
    for words in fields:
        wordsplit = words.split()
        for word in wordsplit:
             count_word = count_word + 1
             print count_word, word
             file.write(str(count_word) + " " + word + '\n')
        file.close()
    n += 1

Answer 3

如果可能，在处理文件时应该使用with - 它是一个上下文管理器，并确保在完成它们后它们被正确关闭（通过留下缩进的块表示）。这里我们使用enumerate和提供的可选start参数 - 这是一种方式（少数几种）在计数器移动到下一行时保持计数器的运行：

# Open the file
with open('test.txt', 'rb') as f:
  # Open the output (in Python 2.7+, this can be done on the same line)
  with open('text_words.txt', 'wb') as o:
    # Set our counter
    counter = 1
    # Iterate through the file
    for line in f:
      # Strip out newlines and split on whitespace
      words = line.strip().split()
      # Start our enumeration, which will return the index (starting at 1) and
      # the word itself
      for index, word in enumerate(words, counter):
        # Write the word to the file
        o.write('{0} {1}\n'.format(index, word))
      # Increment the counter
      counter += len(words)

或者如果您想要更少的行 - 这会使用readlines()将文件读入包含由换行符分隔的项目的列表中。然后，线条本身在空白上分开，每个单词都被拉出。这意味着您基本上遍历文件中所有单词的列表，并与enumerate结合使用，您不需要为计数器增加计数器：

# Open the file
with open('test.txt', 'rb') as f:
  # Open the output (in Python 2.7+, this can be done on the same line)
  with open('text_words.txt', 'wb') as o:
    # Iterate through the file
    for i, w in enumerate((x for l in f.readlines() for x in l.strip().split()), 1):
      o.write('{0} {1}\n'.format(i, w))

使用Python 2.7：

# Open the file
with open('test.txt', 'rb') as f, open('text_words.txt', 'wb') as o:
  # Iterate through the file
  for i, w in enumerate((x for l in f.readlines() for x in l.strip().split()), 1):
    o.write('{0} {1}\n'.format(i, w))

Answer 4

这可能无关紧要，但我建议你用更干净的方法来写。你不需要有3个循环：

lines = open('test.txt').readlines()
file = open('test_words.txt', 'w+')
for line in lines:
  words = line.rstrip('\n').split()

  for i, word in enumerate(words):
    print i, word
    file.write('%d %s\n' % (i+1, word))
file.close()

如何从列表到文件写单词？

4 个答案: