每8个字分割一个字符串。如果第8个字没有 (。或!),移到下一个单词。
我可以从字符串中拆分单词。
with open("file.txt") as c:
for line in c:
text = line.split()
n = 8
listword = [' '.join(text[i:i+n]) for i in range(0,len(text),n)]
for lsb in listword:
print(lsb)
预期输出应为
I'm going to the mall for breakfast, Please meet me there for lunch.
The duration of the next. He figured I was only joking!
I brought back the time.
这就是我要得到的
I'm going to the mall for breakfast, Please
meet me there for lunch. The duration of
the next. He figured I was only joking!
I brought back the time.
答案 0 :(得分:1)
您正在将换行符添加到单词序列中。换行的主要条件是最后一个单词以.
或!
结尾。另外,还有关于最小长度(8个单词或更多)的辅助条件。以下代码将单词收集到缓冲区中,直到满足打印行的条件为止。
with open("file.txt") as c:
out = []
for line in c:
for word in line.split():
out.append(word)
if word.endswith(('.', '!')) and len(out) >= 8:
print(' '.join(out))
out.clear()
# don't forget to flush the buffer
if out:
print(' '.join(out))
答案 1 :(得分:1)
您似乎并没有告诉您的代码寻找.
或!
,只是将文本分成8个单词的块。这是一种解决方案:
buffer = []
output = []
with open("file.txt") as c:
for word in c.split(" "):
buffer.append(word)
if '!' in word or '.' in word and len(buffer) > 7:
output.append(' '.join(buffer))
buffer = []
print output
这将获得一个单词列表,在空格处分开。它将word
添加到buffer
直到满足您的条件(word
包含标点符号并且缓冲区超过7个字)。然后将buffer
附加到您的output
并清除buffer
。
我不知道您文件的结构,因此我用c
作为一长串句子进行了测试。您可能需要对输入进行一些摆弄,以使其与代码所期望的一致。
答案 2 :(得分:0)
我不确定如何通过理解列表来实现这一目标,但是您可以尝试使用常规的for循环来实现。
with open("file.txt") as c:
for line in c:
text = line.split()
n = 8
temp = []
listword = []
for val in text:
if len(temp) < n or (not val.endswith('!') and not val.endswith('.')):
temp.append(val)
else:
temp.append(val)
listword.append(' '.join(temp))
temp = []
if temp: # if last line has less than 'n' words, it will append last line
listword.append(' '.join(temp))
for lsb in listword:
print(lsb)
答案 3 :(得分:0)
您可能已经知道,您尚未编写任何代码来检查标点符号。最好的方法可能是使用两个索引来跟踪要打印的部分的开始和结束。该部分必须至少包含8个单词,但如果在第8个单词上未找到标点符号,则必须更大。
n = 8
with open('file.txt') as c:
for line in c:
words = line.split()
# Use two indexes to keep track of which section to print
start = 0
end = start + n
while end < len(words):
# At the last word of this section, if punctuation not found, advance end until punctuation found
if '.' not in words[end - 1] and '!' not in words[end - 1]:
for word in words[end:]:
if '.' in word or '!' in word:
break
end += 1
print(' '.join(words[start:end + 1])) # print from start to end, including word at end
start = end + 1 # advance start to one after last word
end += n # advance end 8 more words
print(' '.join(words[start:end])) # print the last section regardless of punctuation
结果:
I'm going to the mall for breakfast, Please meet me there for lunch.
The duration of the next. He figured I was only joking!
I brought back the time.