我目前正在学习如何使用漂亮的汤,我正在尝试从usda food数据库解析搜索页面,并将div中的整个span标签(类“ btn-group”)更改为我格式化后的格式变量:search_output
。之后,执行修改后的页面,然后读取页面结果(这是我主要需要帮助的内容)。
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
class Global:
search_input = ''
##### Start function ##### SEARCH: RED APPLE as test
def search_food():
Global.search_input = input("What food do you want to search for?")
search_food()
##### Formatted search span replacement #####
search_output = f'<input type="text" class="searchbox ui-autocomplete-input" placeholder="For example: raw broccoli" value="{Global.search_input}" title="Enter search terms" name="qlookup" id="qlookup" size="180" autocomplete="off">'
##### URL #####
my_url = 'https://ndb.nal.usda.gov/ndb/search/list?SYNCHRONIZER_TOKEN=89dea722-0ff1-487e-9668-f42d2f61d19e&SYNCHRONIZER_URI=%2Fndb%2Fsearch%2Flist&qt=&qlookup=&ds=SR&manu='
##### Download the page #####
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
##### Html parsing #####
page_soup = soup(page_html, "html.parser")
search_container = page_soup.body.find('div', attrs={'class': 'btn-group'})
tag = search_container.find('span', attrs={'class': 'value'})
tag.string = search_output
tag
##### Resulting search page / Haven't gotten this far #####
#s_result_page = #resulting page
#uClient_result2 = uReq(s_result_page)
#page_html2 = uClient_result2.read()
#uClient_result2.close()
它返回:
Traceback (most recent call last):
File "workout_db.py", line 25, in <module>
tag = search_container.find('span', attrs={'class': 'value'})
AttributeError: 'NoneType' object has no attribute 'find'
最终我的目标是获取结果页面,例如:https://ndb.nal.usda.gov/ndb/search/list?SYNCHRONIZER_TOKEN=38fe7d59-b22b-46ec-9cf6-cd9321739158&SYNCHRONIZER_URI=%2Fndb%2Fsearch%2Flist&qt=&qlookup=red+apple&ds=SR&manu=
对其进行解析,使之成为my_url
是与我的搜索查询最相关的结果超链接……然后最终阅读{{ 1}}来解析营养信息。
我面临的主要问题是代码上方的上半部分,但我觉得我需要解释一下我为绘制更清晰的图片所要做的工作。任何帮助将不胜感激。