Python:如何用格式化的span标签替换span标签并通过漂亮的汤返回结果页面

时间:2018-10-09 08:49:54

标签: python python-3.x

我目前正在学习如何使用漂亮的汤,我正在尝试从usda food数据库解析搜索页面,并将div中的整个span标签(类“ btn-group”)更改为我格式化后的格式变量:search_output。之后,执行修改后的页面,然后读取页面结果(这是我主要需要帮助的内容)。

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

class Global:
    search_input = ''

##### Start function ##### SEARCH: RED APPLE as test
def search_food():
    Global.search_input = input("What food do you want to search for?")
search_food()

##### Formatted search span replacement #####
search_output = f'<input type="text" class="searchbox ui-autocomplete-input" placeholder="For example: raw broccoli" value="{Global.search_input}" title="Enter search terms" name="qlookup" id="qlookup" size="180" autocomplete="off">'

##### URL #####
my_url = 'https://ndb.nal.usda.gov/ndb/search/list?SYNCHRONIZER_TOKEN=89dea722-0ff1-487e-9668-f42d2f61d19e&SYNCHRONIZER_URI=%2Fndb%2Fsearch%2Flist&qt=&qlookup=&ds=SR&manu='

##### Download the page #####
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

##### Html parsing #####
page_soup = soup(page_html, "html.parser")
search_container = page_soup.body.find('div', attrs={'class': 'btn-group'})
tag = search_container.find('span', attrs={'class': 'value'})
tag.string = search_output
tag

##### Resulting search page / Haven't gotten this far #####
#s_result_page = #resulting page
#uClient_result2 = uReq(s_result_page)
#page_html2 = uClient_result2.read()
#uClient_result2.close()

它返回:

Traceback (most recent call last):
  File "workout_db.py", line 25, in <module>
    tag = search_container.find('span', attrs={'class': 'value'})
AttributeError: 'NoneType' object has no attribute 'find'

最终我的目标是获取结果页面,例如:https://ndb.nal.usda.gov/ndb/search/list?SYNCHRONIZER_TOKEN=38fe7d59-b22b-46ec-9cf6-cd9321739158&SYNCHRONIZER_URI=%2Fndb%2Fsearch%2Flist&qt=&qlookup=red+apple&ds=SR&manu=对其进行解析,使之成为my_url是与我的搜索查询最相关的结果超链接……然后最终阅读{{ 1}}来解析营养信息。

我面临的主要问题是代码上方的上半部分,但我觉得我需要解释一下我为绘制更清晰的图片所要做的工作。任何帮助将不胜感激。

0 个答案:

没有答案