美丽的汤:findall和引用的类

时间:2017-05-10 04:29:30

标签: python class beautifulsoup

我是一个新的python用户,在BS问题上撞墙。我的目标网页包含以下内容:

<div class=rbHeader>
<span role="heading" aria-level="3" class="ws_bold">
Experience Level</span>
</div>

<div class="  row  result" id="p_bc0437dce636c6f4" data-jk="bc0437dce636c6f4" itemscope itemtype="http://schema.org/JobPosting" data-tn-component="organicJob">

...

</div>

我已按如下方式对页面进行了解析:

   target = Soup(urllib.urlopen(url), "lxml") 

如果我跑

targetElements = target.findAll('div', attrs={'class':'rbheader'})
print targetElements

我得到了

 [<div class="rbHeader">\n<span aria-level="3" class="ws_bold" role="heading">\nExperience Level</span>\n</div>]

但如果我跑

targetElements = target.findAll('div', attrs={'class':'  row  result'})
print targetElements

我得到了

[]

如果该类在引号中,无论我尝试选择哪个类,都是这种情况。我似乎只能找到引号之外的类。

非常感谢任何帮助。

最佳

莱恩

2 个答案:

答案 0 :(得分:1)

总是从所有类中删除空格。

你可以得到一个课程:

<?php
        include('../dbconnect.php');
        session_start();
?>



<!DOCTYPE html>
<html>

<head>
    <title>Online Examination System</title>
</head>
<body>


    <div id="container">
    <h1>Level 1 IT Question Paper</h1>
    <h2>Please read the question carefully and answer it confidently. Good Luck All!</h2>


        <?php
        if(isset($_POST['Submit']))
        {
            $sql="SELECT * from lvl1itquepaper";
        $run_que = mysqli_query($mysqli, $sql);
        $check_que = mysqli_num_rows($run_que);

            while ($row=$run_que->fetch_assoc())
            {
                $questionno = $row['questionno'];
                $question = $row['question'];
                $option1 = $row['option1'];
                $option2 = $row['option2'];
                $option3 = $row['option3'];
                $option4 = $row['option4'];
                $ans_array = array($option1, $option2, $option3, $option4);
                $student_ans = $row['option1'];
                $student_ans = $row['option2'];
                $student_ans = $row['option3'];
                $student_ans = $row['option4'];

                $sql="Insert into lvl1itresult (questionno, question, studentans, username) values ('.$questionno.', '$question', '$student_ans', '".$_SESSION['login_user']."')";

            $submit = $mysqli->query($sql);
        }
        }

        ?>




        <form method= "post">
    <?php
        echo "Welcome, ";
        $sql="SELECT * from lvl1itstudent WHERE username= '".$_SESSION['login_user']."'";
        $find_student = mysqli_query($mysqli, $sql);
        $check_student = mysqli_num_rows($find_student);
            if ($check_student>0){
                while($row = $find_student->fetch_assoc())
                {
                    echo $row['username'];
                }
            }
        echo "<br><br><br><br>";

        $sql="SELECT * from lvl1itquepaper";
        $run_que = mysqli_query($mysqli, $sql);
        $check_que = mysqli_num_rows($run_que);

        if($check_que>0){
            while ($row=$run_que->fetch_assoc())
            {
                $questionno = $row['questionno'];
                $question = $row['question'];
                $option1 = $row['option1'];
                $option2 = $row['option2'];
                $option3 = $row['option3'];
                $option4 = $row['option4'];
                $ans_array = array($option1, $option2, $option3, $option4);
                shuffle($ans_array);

                echo "".$questionno. "." .$question."<br>";
                echo "<input type='radio' name='.$questionno.' value='".$ans_array[0]."'>".$ans_array[0]."<br>";
                echo "<input type='radio' name='.$questionno.' value='".$ans_array[1]."'>".$ans_array[1]."<br>";
                echo "<input type='radio' name='.$questionno.' value='".$ans_array[2]."'>".$ans_array[2]."<br>";
                echo "<input type='radio' name='.$questionno.' value='".$ans_array[3]."'>".$ans_array[3]."<br><br>";
            }

        }
        else {
            echo "there is no data in database";
        }

        ?>


       <input type="submit" value = "Submit" name= "Submit" style= "width:60px; height:30px";>


        </form>


    </div>


</body>

...或:

targetElements = target.findAll('div', attrs={'class':'row'})

如果您怀疑每一项都可能返回太多结果,您可以这样做:

targetElements = target.findAll('div', attrs={'class':'result'})

....其中soup.select('div.row.result') 是您的实例。

答案 1 :(得分:0)

以下是基于div

的示例
div_test='<div class=rbHeader><span role="heading" aria-level="3" class="ws_bold">Experience Level</span></div><div class="  row  result" id="p_bc0437dce636c6f4" data-jk="bc0437dce636c6f4" itemscope itemtype="http://schema.org/JobPosting" data-tn-component="organicJob"></div>'
target = bs4.BeautifulSoup(div_test,'html.parser')

1,类名区分大小写,您的代码

targetElements = target.findAll('div', attrs={'class':'rbheader'})
print targetElements

将无法获得[]

targetElements = target.findAll('div', attrs={'class':'rbHeader'})
print targetElements

会给你:

[<div class="rbHeader"><span aria-level="3" class="ws_bold" role="heading">Experience Level</span></div>]

2,代码:

targetElements = target.findAll('div', attrs={'class':'  row  result'})
print targetElements

它会给你结果而不是什么:

[<div class=" row result" data-jk="bc0437dce636c6f4" data-tn-component="organicJob" id="p_bc0437dce636c6f4" itemscope="" itemtype="http://schema.org/JobPosting"></div>]