在汤中的每个文本之后添加逗号。find_all()

时间:2018-11-21 01:01:22

标签: python selenium beautifulsoup

汤=汤.find_all('tr'):

[<tr data-row="0"><th class="left " csk="Murray,Jamal" data-append- 
 csv="murraja01" data-stat="player" scope="row"><a 
 href="/players/m/murraja01.html">Jamal Murray</a></th><td class="right " 
 csk="2713" data-stat="mp">45:13</td><td class="right " data- 
 stat="fg">5</td><td class="right " data-stat="fga">12</td><td class="right 
 " data-stat="fg_pct">.417</td><td class="right " data-stat="fg3">3</td><td 
 class="right " data-stat="fg3a">6</td><td class="right " data- 
stat="fg3_pct">.500</td><td class="right " data-stat="ft">2</td><td 
class="right " data-stat="fta">2</td><td class="right " data- 
stat="ft_pct">1.000</td><td class="right " data-stat="orb">1</td><td 
class="right " data-stat="drb">3</td><td class="right " data- 
stat="trb">4</td><td class="right " data-stat="ast">5</td><td class="right 
" data-stat="stl">1</td><td class="right " data-stat="blk">1</td><td 
class="right " data-stat="tov">5</td><td class="right " data- 
stat="pf">1</td><td class="right " data-stat="pts">15</td><td class="right 
" data-stat="plus_minus">+6</td></tr>]

[x。汤中的x.text.find_all('tr',{'data-row':0})]:

['Jamal Murray45:13512.41736.500221.0001345115115+6']

预期列表:

['Jamal Murray', '45.13', '5','12','.417','3','6','0.500','2','2','1.000','1','3','4','5','1','1,'5','1','15','+6']

如何在每个 th 标记的每个文本之后添加逗号,以使列表类似于上面的预期列表?

1 个答案:

答案 0 :(得分:1)

from bs4 import BeautifulSoup as bs



html = '''<tr data-row="0"><th class="left " csk="Murray,Jamal" data-append-
 csv="murraja01" data-stat="player" scope="row"><a
 href="/players/m/murraja01.html">Jamal Murray</a></th><td class="right "
 csk="2713" data-stat="mp">45:13</td><td class="right " data-
 stat="fg">5</td><td class="right " data-stat="fga">12</td><td class="right
 " data-stat="fg_pct">.417</td><td class="right " data-stat="fg3">3</td><td
 class="right " data-stat="fg3a">6</td><td class="right " data-
stat="fg3_pct">.500</td><td class="right " data-stat="ft">2</td><td
class="right " data-stat="fta">2</td><td class="right " data-
stat="ft_pct">1.000</td><td class="right " data-stat="orb">1</td><td
class="right " data-stat="drb">3</td><td class="right " data-
stat="trb">4</td><td class="right " data-stat="ast">5</td><td class="right
" data-stat="stl">1</td><td class="right " data-stat="blk">1</td><td
class="right " data-stat="tov">5</td><td class="right " data-
stat="pf">1</td><td class="right " data-stat="pts">15</td><td class="right
" data-stat="plus_minus">+6</td></tr>'''
data = []
page = bs(html, 'html.parser')
data.append(page.find('th').text.strip())
for item in page.find_all('td'):
    data.append(item.text)
print(data)


Output:
['Jamal Murray', '45:13', '5', '12', '.417', '3', '6', '.500', '2', '2', '1.000', '1', '3', '4', '5', '1', '1', '5', '1', '15', '+6']