Question

您好，我正在学习python，因此尝试制作一个带有漂亮汤和请求的简单网络抓取脚本。我正在尝试捕获一个html页面的内容。 html就是这样-

<frame name="name", src="..">
    #document
        <html>
            <head>
               <script language="JavaScript" src="nav.js"></script>
            <frameset>
                <table>
........

我尝试过这样：

with urllib.request.urlopen(url+page['src']) as frameurl:
    response = frameurl.read()
   # print('response',response)
    soup = BeautifulSoup(response,'html.parser')
    table =soup.find_all('#document') #want to read the data under this
    frames=getAllFrames(soup)
    for frame in frames:
        if(frame['name'] == 'leftnav'):
            print('navbar:',frame)
            #print('test',frame.getHtml())
            #frames =soup.find_all("frame")
            print('html',frame.find('html')) #gives None
            for child in frame.children:  #nothing 
                print('child',child)

如何读取#document中的数据/表，尤其是标签框架集。框架集还有一个.html是空白页。（我也非常了解html）

如何在Beautiful Soup python中阅读#document

0 个答案: