使用Scraping获取产品名称

时间:2016-12-26 08:38:46

标签: python beautifulsoup

以下是获取产品名称的代码" RENU FRESH LENS SOLUTION 120 ML "来自网址..这位于p标签中。我只需要这个名字。

import requests
import lxml
from bs4 import BeautifulSoup

url = "http://www.lenskart.com/renu-fresh-lens-solution-100-ml.html"

source = requests.get(url)
data = source.content
soup = BeautifulSoup(data, "lxml")

pn = soup.find_all("div", {"class":"prcdt-overview"})[0].text
print pn

3 个答案:

答案 0 :(得分:2)

import requests
from bs4 import BeautifulSoup

url = "http://www.lenskart.com/renu-fresh-lens-solution-100-ml.html"

source = requests.get(url)
# data = source.content   pass the variable in the BeautifulSoup()
soup = BeautifulSoup(source.content, "lxml")

查找()版本:

pn = soup.find('div', class_="prcdt-overview").p.text
  1. 您无需导入'lxml',BeautifulSoup会为您执行此操作
  2. 如果您只需要find_all()的第一个标记,则应该尝试find(),它会返回find_all()
  3. 中的第一个标记
  4. 您可以使用tag.tag.find()/find_all()逐步获取代码。
  5. tag.tag_nametag.find('tag_name')
  6. 的简称

    CSS选择器版本:

    soup.select_one(".prcdt-overview p").text
    
    1. select_one()将返回select()的第一个标记,例如find()find_all()

答案 1 :(得分:1)

试试这个

pn = soup.select(".prcdt-overview h1[itemprop=name] p")[0].text

pn =soup.select(".prcdt-overview")[0].select("h1[itemprop=name]>p")[0].text

还有其他方法,试试这些

希望这有帮助

答案 2 :(得分:1)

更详细的方式:

var app = angular.module("clock.app");
  app.controller('timer',['$scope','$interval','$timeout','timerFactory',
  function($scope, $interval,$timeout,timerFactory){
    var framework7App = new Framework7();
    var $$ = Dom7;
    $scope.timeList = [
      {"hour":0, "minutes":1, "seconds": 6},
      {"hour":0, "minutes":3, "seconds": 180},
      {"hour":0, "minutes":5, "seconds": 300}];

      var today = new Date();
      var arr,hour, minutes, seconds,convertedSec;

      var getStoredList = JSON.parse(localStorage.getItem("timeListDetails"));
      if(getStoredList !=null){
        if(getStoredList.length != 0){
            $scope.timeList = getStoredList;
        }else{
           localStorage.setItem("timeListDetails", JSON.stringify($scope.timeList));
        }
      }else{
          getStoredList = $scope.timeList;
      }
      $scope.timerWithInterval = 0;


      $scope.startTimerWithInterval = function() {
        $scope.timerWithInterval = 0;
        if($scope.myInterval){
          $interval.cancel($scope.myInterval);
        }
        $scope.onInterval = function(){
          $scope.timerWithInterval++;
        }
        $scope.myInterval = $interval($scope.onInterval,1000);
      };

      $scope.resetTimerWithInterval = function(){
        $scope.timerWithInterval = 0;
        $interval.cancel($scope.myInterval);
      }

      $scope.timeCounterInSeconds= function(seconds) {
        $scope.startTimerWithInterval();
        $timeout(function () {
          $scope.timeCounter(seconds)
        }, 1000);
      };

      $scope.timeCounter = function(seconds) {
        if($scope.timerWithInterval==seconds) {
          $scope.resetTimerWithInterval();
          framework7App.alert('Time Over','');
        }
        else {
          $timeout(function () {
            $scope.timeCounter(seconds)
          }, 1000);
        }
      };
      $scope.submit = function() {
        $scope.timeList.push({"hour":hour,
                              "minutes":minutes,
                              "seconds":seconds,
                              "convertedSec":convertedSec,
                              "timeFlag": true});
        localStorage.setItem("timeListDetails", JSON.stringify($scope.timeList));
        $scope.hidePopup();
      };

      $scope.displayPopup = function(){
        $scope.popupAddTimer = true;
      }
      $scope.hidePopup = function(){
        $scope.popupAddTimer = false;
      }

     timerFactory.picker();
}]);