谁能告诉我我自己的代码有什么问题? 我想按单词将大文本分割成小文本。例如,每个段包含60个单词。
file=r'C:\Users\Nujou\Desktop\Master\thesis\steganalysis\dataset\economy2.txt'
openFile= open(file, 'r', encoding='utf-8-sig')
words= openFile.read().split()
#print (words)
i = 0
for idx, w in enumerate(words, start=0):
textNum = 1
while textNum <= 20:
wordAsText = []
print("word list before:", wordAsText)
while i<idx+60:
wordAsText.append(words[i])
i+=1
print ("word list after:", wordAsText)
textSeg=' '.join(wordAsText)
print (textNum, textSeg)
files = open(r"C:\Users\Nujou\Desktop\Master\thesis\steganalysis\dataset\datasetEco\Eco" + str(textNum) + ".txt", "w", encoding='utf-8-sig')
files.write(textSeg)
files.close()
idx+=60
if textNum!=20:
continue
textNum+=1
我的大文件(economy2)包含超过12K个单词。
编辑: 感谢您的所有回复。我尝试了发现的here,并达到了我的要求。
修改后的代码:
file=r'C:\Users\Nujou\Desktop\Master\thesis\steganalysis\dataset\economy2.txt'
openFile= open(file, 'r', encoding='utf-8-sig')
words= openFile.read().split()
#print (words)
n=60
segments=[' '.join(words[i:i+n]) for i in range(0,len(words),n)] #from link
i=1
for s in segments:
seg = open(r"C:\Users\Nujou\Desktop\Master\thesis\steganalysis\dataset\datasetEco\Eco" + str(i) + ".txt", "w", encoding='utf-8-sig')
seg.write(s)
seg.close()
i+=1