使用Python和BeautifulSoup提取数字(多个跨度和类)

时间:2017-04-19 05:31:26

标签: python python-2.7 beautifulsoup

我正在尝试将Python与BeautifulSoup一起使用,以便从网页中提取多个数字。我知道我做错了,因为我的脚本返回一个空数组。有多个跨度和类的事实也让我感到困惑。以下是我正在使用的HTML数据示例:

<td class="confluenceTd" colspan="1">
         <span>
          Autoworks
         </span>
        </td>
        <td class="confluenceTd" colspan="1">
         900009
        </td>
        <td class="confluenceTd" colspan="1">
         <p>
          uyi: 3456778, 33344778, 11199087
         </p>
         <p>
          PRY: 54675389
         </p>
        </td>
<td class="confluenceTd" colspan="1">
         AutoNone
        </td>
        <td class="confluenceTd" colspan="1">
         9998887
        </td>
        <td class="confluenceTd" colspan="1">
         <p>
          YUN: 232323, 6788889, 78695554
         </p>
         <p>
          IOY: 3444666, 2343233, 1232322
         </p>
        </td>

这是我的Python代码:

import requests
from bs4 import BeautifulSoup

s = requests.Session()

s.post('https://wiki.example.com/login', data={'user': "user1", 'password': 
'pass1'})

r = s.get('https://wiki.example.com/example/section')

data_payload = r.content

soup = BeautifulSoup(data_payload, 'html.parser')
data = soup.findAll("span", {"class":"confluenceTd"})

print data            

同样,我只想提取实际数字。任何帮助将不胜感激。感谢。

1 个答案:

答案 0 :(得分:1)

如果你想让特定类下面的所有数字都使用正则表达式/正则表达式来提取数字并确保请求拉动html

 import requests,re
 from bs4 import BeautifulSoup
 s = requests.Session()
 s.post('https://wiki.example.com/login', data={'user':"user1",'password': 'pass1'})
 r = s.get('https://wiki.example.com/example/section')
 data_payload = r.content
 soup = BeautifulSoup(data_payload, 'html.parser')
 data = soup.findAll("td", {"class":"confluenceTd"})
 for d in data:
    m=re.search('([0-9]+)',str(d.findAll(text=True)))
    if m:
       print m.group(0)