Python v3,Beautifoulsoup - 具有相同名称的多个div标签

时间:2017-11-15 12:43:40

标签: python web-scraping beautifulsoup

 soup = BeautifulSoup(html, "html.parser") # BeautifulSoup(markup, "lxml")
 items = soup.find_all("div","_3u1 _gli _uvb", recursive=True)
   for item in items: 
      abouts = item.find_all("div", {"class":"_glo"}, recursive = True)[0].text
   print (abouts)

HTML页面:

          <div class="_glo">
            <div>
              <div class="_ajw">
                <div class="_52eh">
                    "text
                </div>
              </div>
              <div class="_ajw">
                <div class="_52eh">
                    "text"
                </div>
              </div>
              <div class="_ajw">
                <div class="_52eh">
                   "text"
                </div>
              </div>
            </div>
          </div>

下午,我正在尝试使用beautifullsoup,python抓一个网页。我需要&#34;文本&#34;单独变量中的字符串。当我打印出来时,我得到:&#34;文本文本&#34;我希望它能够分开。

亲切的问候

2 个答案:

答案 0 :(得分:0)

试试这个:

items = soup.find_all('div', attrs={'class':'_ajw'})
dict = {}
for i in range(len(items)):
    dict['text'+str(i+1)] = item[i].find('div', attrs={'class':'_52eh'}).text
print(dict)

这会给你这样的东西:

{'text1': text, 'text2': text, 'text3': text}

答案 1 :(得分:0)

我将使用汤。选择将类选择器应用于html。这是一种按类获取适当元素列表的快速方法

from bs4 import BeautifulSoup as bs

html = '''
  <div class="_glo">
            <div>
              <div class="_ajw">
                <div class="_52eh">
                    "text
                </div>
              </div>
              <div class="_ajw">
                <div class="_52eh">
                    "text"
                </div>
              </div>
              <div class="_ajw">
                <div class="_52eh">
                   "text"
                </div>
              </div>
            </div>
          </div>
          '''
soup = bs(html, 'lxml')

items = [item.text.strip() for item in soup.select('._52eh')]
print(items)