我正在使用Python开发一个项目,并尝试将单词列表分成字母文件。所以任何以'a'或'A'开头的单词都会进入'A.html'文件。我能够创建文件,并拥有以字母开头的所有单词,但我需要递归执行,以便它将遍历所有字母并将它们放入不同的文件中。以下是一些代码: class LetterIndexPage(object):
def __init__(self, wordPage):
self.alphaList = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','Numbers','Special Characters']
def createLetterPages(self):
if not os.path.exists('A.html'):
file('A.html', 'w')
letterFileName = 'A.html'
letterItemList = []
for item in wordItems():
if item[:1] == 'a' or item[:1] == 'A':
letterItemList.append(item)
letterItems = reduce(lambda letterItem1, letterItem2: letterItem1 + letterItem2, letterItemList)
return letterItems
wordItems()方法返回网页中的所有文本。我不知道从哪里开始。有人可以帮忙吗?
答案 0 :(得分:0)
from itertools import groupby
import requests
page = requests.get('http://www.somepage.com/some.txt')
all_words = page.text.split()
groups = groupby(sorted(all_words),lambda x:x[0].lower())
for g in groups:
with open("%s.html"%g[0],"a") as f:
f.write("\n".join(g[1]))
我认为应该工作(不测试......)
答案 1 :(得分:0)
首先打开文件,完成工作,然后关闭它们:
from string import ascii_uppercase
output_files = {letter: open(letter + '.html', 'w') for letter in ascii_uppercase}
for word in list_of_words:
output_files[word[0].upper()].write(word + '\n')
for of in output_files:
of.close()