如何从列表到文件写单词?

时间:2012-12-17 18:23:41

标签: python

“test.txt”

中有两个sentnecs

sentence1 =句子是由一个或多个单词组成的语法单位。

sentence2 =句子也可以单独用正字法定义。

count_line = 0
for line in open('C:/Users/Desktop/test.txt'):
    count_line = count_line +1
    fields = line.rstrip('\n').split('\t')
    ##print count_line, fields
    file = open('C:/Users/Desktop/test_words.txt', 'w+')
    count_word = 0
    for words in fields:
        wordsplit = words.split()
        for word in wordsplit:
             count_word = count_word + 1
             print count_word, word
             file.write(str(count_word) + " " + word + '\n')
        file.close()

我在“test_words.txt”中的结果只显示了第二句中的单词:

1 A 
2 sentence
3 can
4 also
5 be
6 defined
7 in
8 orthographic
9 terms
10 alone.

如何编写第一句中的单词并在第二句“test_words.txt”中的单词后跟?

有什么建议吗?

4 个答案:

答案 0 :(得分:3)

在您的代码中,您多次打开和关闭输出文件,导致代码覆盖您从第一句中写入的内容。 简单的解决方案是只打开一次并且只关闭一次。

count_line = 0
# Open outside the loop
file = open('C:/Users/Desktop/test_words.txt', 'w+')
for line in open('C:/Users/Desktop/test.txt'):
    count_line = count_line +1
    fields = line.rstrip('\n').split('\t')
    ##print count_line, fields
    count_word = 0
    for words in fields:
        wordsplit = words.split()
        for word in wordsplit:
            count_word = count_word + 1
            print count_word, word
            file.write(str(count_word) + " " + word + '\n')
# Also close outside the loop
file.close()

答案 1 :(得分:0)

发生这种情况的原因是因为当您第二次打开文件时,您不会保留其中的原始文本。当你打开一个文件并用Python写入它时,你基本上会覆盖它的内容,除非你将它们存储在变量中并重新编写它们。

试试这段代码:

count_line = 0
for n, line in enumerate(open('test.txt')):
    count_line = count_line +1
    fields = line.rstrip('\n').split('\t')
    ##print count_line, fields
    already_text = open('test_words.txt').read() if n > 0 else ''
    file = open('test_words.txt', 'w+')
    count_word = 0
    file.write(already_text)
    for words in fields:
        wordsplit = words.split()
        for word in wordsplit:
             count_word = count_word + 1
             print count_word, word
             file.write(str(count_word) + " " + word + '\n')
        file.close()

这是我运行时的输出:

1 A
2 sentence
3 is
4 a
5 grammatical
6 unit
7 consisting
8 of
9 one
10 or
11 more
12 words.
1 A
2 sentence
3 can
4 also
5 be
6 defined
7 in
8 orthographic
9 terms
10 alone.

这是没有enumerate()的代码:

count_line = 0
n = 0
for line in open('test.txt'):
    count_line = count_line +1
    fields = line.rstrip('\n').split('\t')
    ##print count_line, fields
    already_text = open('test_words.txt').read() if n > 0 else ''
    file = open('test_words.txt', 'w+')
    count_word = 0
    file.write(already_text)
    for words in fields:
        wordsplit = words.split()
        for word in wordsplit:
             count_word = count_word + 1
             print count_word, word
             file.write(str(count_word) + " " + word + '\n')
        file.close()
    n += 1

答案 2 :(得分:0)

如果可能,在处理文件时应该使用with - 它是一个上下文管理器,并确保在完成它们后它们被正确关闭(通过留下缩进的块表示)。这里我们使用enumerate和提供的可选start参数 - 这是一种方式(少数几种)在计数器移动到下一行时保持计数器的运行:

# Open the file
with open('test.txt', 'rb') as f:
  # Open the output (in Python 2.7+, this can be done on the same line)
  with open('text_words.txt', 'wb') as o:
    # Set our counter
    counter = 1
    # Iterate through the file
    for line in f:
      # Strip out newlines and split on whitespace
      words = line.strip().split()
      # Start our enumeration, which will return the index (starting at 1) and
      # the word itself
      for index, word in enumerate(words, counter):
        # Write the word to the file
        o.write('{0} {1}\n'.format(index, word))
      # Increment the counter
      counter += len(words)

或者如果您想要更少的行 - 这会使用readlines()将文件读入包含由换行符分隔的项目的列表中。然后,线条本身在空白上分开,每个单词都被拉出。这意味着您基本上遍历文件中所有单词的列表,并与enumerate结合使用,您不需要为计数器增加计数器:

# Open the file
with open('test.txt', 'rb') as f:
  # Open the output (in Python 2.7+, this can be done on the same line)
  with open('text_words.txt', 'wb') as o:
    # Iterate through the file
    for i, w in enumerate((x for l in f.readlines() for x in l.strip().split()), 1):
      o.write('{0} {1}\n'.format(i, w))

使用Python 2.7:

# Open the file
with open('test.txt', 'rb') as f, open('text_words.txt', 'wb') as o:
  # Iterate through the file
  for i, w in enumerate((x for l in f.readlines() for x in l.strip().split()), 1):
    o.write('{0} {1}\n'.format(i, w))

答案 3 :(得分:0)

这可能无关紧要,但我建议你用更干净的方法来写。你不需要有3个循环:

lines = open('test.txt').readlines()
file = open('test_words.txt', 'w+')
for line in lines:
  words = line.rstrip('\n').split()

  for i, word in enumerate(words):
    print i, word
    file.write('%d %s\n' % (i+1, word))
file.close()