BeautifulSoup不提取图像替代文本

时间:2018-11-25 15:57:36

标签: python beautifulsoup

我正处于创建网络刮板的非常初步的阶段。我对Python还是很陌生。我正在尝试从网页中提取星级。这是为了找到页面中所有img alt文本的列表并将其打印到控制台。

url = 'https://www.nhtsa.gov/vehicle/2017/FORD/ESCAPE/SUV/AWD#safety-ratings-frontal' #url to retrieve data from
html = '<div class="col-sm-6"><img src="/sites/nhtsa.dot.gov/themes/nhtsa_gov/images/star-rating/5.png" alt="5 star" class="vehicle-base-details--rating"></div>' #temporary-- for testing
page = urlopen(url)
soup = BeautifulSoup(page, "html.parser")
for div in soup.find_all('div'): #lists all image alt text
    for img in div.find_all('img', alt=True):
        print(img['alt'])

当我在第4行用“ html”替换“ page”时,BeautifulSoup能够提取我需要的内容并打印“ 5星”。问题是当我尝试直接从网页获取HTML时。我也尝试过按对象的类进行搜索,当直接从站点获取它时,我最终得到一个空列表。

1 个答案:

答案 0 :(得分:0)

const canvas = new fabric.Canvas('c')
const box1 = new fabric.Rect({
  left: 50,
  top: 50,
  width: 100,
  height: 100,
  fill: 'green'
})
const box2 = new fabric.Rect({
  left: 250,
  top: 250,
  width: 100,
  height: 100,
  fill: 'red'
})
const box1point = box1.getPointByOrigin('center', 'bottom')
const box2point = box2.getPointByOrigin('center', 'top')
const connector = new fabric.Line(
    [box1point.x, box1point.y, box2point.x, box2point.y],
    {
      stroke: "black",
      strokeWidth: 3,
      lockScalingX: true,
      lockScalingY: true,
      lockRotation: true,
      hasControls: true,
      hasBorders: true,
      lockMovementX: true,
      lockMovementY: true
    }
  )
box1.on('moving', function() {
  const connectPoint = this.getPointByOrigin('center', 'bottom')
  connector.set({
    x1: connectPoint.x,
    y1: connectPoint.y
  })
})
box2.on('moving', function() {
  const connectPoint = this.getPointByOrigin('center', 'top')
  connector.set({
    x2: connectPoint.x,
    y2: connectPoint.y
  })
})
canvas.add(box1, box2, connector)