我正在测试我的程序,并发现了一些错误。 我能够自己解决其中一些问题,但我需要帮助。
基本上,有三个功能(字频,章节计数器和诗歌计数器)。 [我只发布了频率部分的单词]当我测试第一个单词频率时,我注意到最小的书和章节不起作用。换句话说,如果我想从49章第6章(这是49章的最后一章)到50章第2章的书中得到词频,那就不包括第49章第6章。
我该如何解决?
from bs4 import BeautifulSoup
import operator
import requests
import re
import bs4
def word_frequency(max_book_w, max_chapter_w):
word_list = []
book = b
chapter = c
while book <= max_book_w:
while chapter <= max_chapter_w:
url = 'http://www.holybible.or.kr/B_NIV/cgi/bibleftxt.php?VR=NIV&VL={}&CN={}&CV=99'.format(book, chapter)
source_code = requests.get(url).text
soup = BeautifulSoup(source_code, "html.parser")
for bible_text in soup.findAll('font', {'class': 'tk4l'}):
content = bible_text.get_text()
words = content.lower().replace('-', ' ').split()
for each_word in words:
word_list.append(each_word)
chapter += 1
book += 1
chapter = 1
clean_up_list(word_list)
def clean_up_list(word_list):
clean_word_list = []
for word in word_list:
symbols = "~:!@#$%^&*()_+`{}|\"?><`-=\][';/.,']"
for i in range(0, len(symbols)):
word = word.replace(symbols[i], "")
if len(word) > 0:
clean_word_list.append(word)
dictionary(clean_word_list)
def dictionary(clean_word_list):
word_count = {}
for word in clean_word_list:
if word in word_count:
word_count[word] += 1
else:
word_count[word] = 1
for key, value in sorted(word_count.items(), key=operator.itemgetter(1)):
print(key, value)
user = int(input('''What do you need a help on?
1 - word frequency, 2 - chapter count, 3 - verse count'''))
if user == 1:
b = int(input("type the starting book"))
c = int(input("type the starting chapter of the book"))
x = max_book_w = int(input("type the last book"))
y = max_chapter_w = int(input("what chapter of the book would be the last chapter?"))
word_frequency(x, y)
elif user == 2:
min_b = int(input("what is the starting book?"))
max_b = int(input("what is the last book?"))
chapter_counter(max_b)
elif user == 3:
books = int(input("which book do you want?"))
chapters = int(input("which chapter of the book you want?"))
if __name__ == '__main__':
verse(books, chapters)
第1次编辑:实际上,它不仅是该程序排除的第一本书。我尝试用49-5到50-2,但49-5和49-6都不包括在内。