在test.txt中,我有两行句子。
The heart was made to be broken.
There is no surprise more magical than the surprise of being loved.
在代码中:
import re
file = open('test.txt','r')#specify file to open
data = file.readlines()
file.close()
print "---------------------------------------------------"
count = 0
for line in data:
line_split = re.findall(r'[^ \t\n\r, ]+',line)
count = count + 1
def chunks(line_split, n):
for i in xrange(0, len(line_split), n):
yield line_split[i:i+n]
separate_word = list(chunks(line_split, 8))
for i, word in enumerate(separate_word, 1):
print count, ' '.join(word)
print "---------------------------------------------------"
代码的结果:
---------------------------------------------------
1 The heart was made to be broken.
---------------------------------------------------
2 There is no surprise more magical than the
2 surprise of being loved.
---------------------------------------------------
有没有办法在第一行显示句子数?
期待结果:
---------------------------------------------------
1 The heart was made to be broken.
---------------------------------------------------
2 There is no surprise more magical than the
surprise of being loved.
---------------------------------------------------
答案 0 :(得分:1)
只需检查它是否是第一行:
for i, word in enumerate(separate_word):
if i == 0:
print count, ' '.join(word)
else:
print " ", ' '.join(word)
我强烈建议您使用the with
statement打开该文件。这更具可读性,可以处理为您关闭文件,即使是异常也是如此。
另一个好主意是直接在文件上循环 - 这是一个更好的主意,因为它不会立即将整个文件加载到内存中,这是不需要的,并且可能导致大文件出现问题。
您也应该像data
那样使用enumerate()
,因为这样您就不会手动处理count
。
你也反复定义chunks()
,这有点无意义,最好在开头定义一次。在调用它的地方,也没有必要列出一个列表 - 我们可以直接在生成器上进行迭代。
如果我们纠正了所有这些,我们就会得到清洁工:
import re
def chunks(line_split, n):
for i in xrange(0, len(line_split), n):
yield line_split[i:i+n]
print "---------------------------------------------------"
with open("test.txt", "r") as file:
for count, line in enumerate(file, 1):
line_split = re.findall(r'[^ \t\n\r, ]+',line)
separate_word = chunks(line_split, 8)
for i, word in enumerate(separate_word):
if i == 0:
print count, ' '.join(word)
else:
print " ", ' '.join(word)
print "---------------------------------------------------"
值得注意的是变量名称有点误导word
,例如,不是一个词。
答案 1 :(得分:0)
Python内置了文本包装。我承认下面的格式不完美,但你会明白: - )
#!/usr/bin/env python
import sys
import textwrap
with open('test.txt') as fd:
T = [line.strip() for line in fd]
for n, s in enumerate(T):
print '-'*42
sys.stdout.write("%d " % n)
for i in textwrap.wrap(s, 45):
sys.stdout.write("%s\n" % i)
print '-'*42
输出:
------------------------------------------
0 The heart was made to be broken.
------------------------------------------
1 There is no surprise more magical than the
surprise of being loved.
------------------------------------------