我是一个新的python用户,在BS问题上撞墙。我的目标网页包含以下内容:
<div class=rbHeader>
<span role="heading" aria-level="3" class="ws_bold">
Experience Level</span>
</div>
<div class=" row result" id="p_bc0437dce636c6f4" data-jk="bc0437dce636c6f4" itemscope itemtype="http://schema.org/JobPosting" data-tn-component="organicJob">
...
</div>
我已按如下方式对页面进行了解析:
target = Soup(urllib.urlopen(url), "lxml")
如果我跑
targetElements = target.findAll('div', attrs={'class':'rbheader'})
print targetElements
我得到了
[<div class="rbHeader">\n<span aria-level="3" class="ws_bold" role="heading">\nExperience Level</span>\n</div>]
但如果我跑
targetElements = target.findAll('div', attrs={'class':' row result'})
print targetElements
我得到了
[]
如果该类在引号中,无论我尝试选择哪个类,都是这种情况。我似乎只能找到引号之外的类。
非常感谢任何帮助。
最佳莱恩
答案 0 :(得分:1)
总是从所有类中删除空格。
你可以得到一个课程:
<?php
include('../dbconnect.php');
session_start();
?>
<!DOCTYPE html>
<html>
<head>
<title>Online Examination System</title>
</head>
<body>
<div id="container">
<h1>Level 1 IT Question Paper</h1>
<h2>Please read the question carefully and answer it confidently. Good Luck All!</h2>
<?php
if(isset($_POST['Submit']))
{
$sql="SELECT * from lvl1itquepaper";
$run_que = mysqli_query($mysqli, $sql);
$check_que = mysqli_num_rows($run_que);
while ($row=$run_que->fetch_assoc())
{
$questionno = $row['questionno'];
$question = $row['question'];
$option1 = $row['option1'];
$option2 = $row['option2'];
$option3 = $row['option3'];
$option4 = $row['option4'];
$ans_array = array($option1, $option2, $option3, $option4);
$student_ans = $row['option1'];
$student_ans = $row['option2'];
$student_ans = $row['option3'];
$student_ans = $row['option4'];
$sql="Insert into lvl1itresult (questionno, question, studentans, username) values ('.$questionno.', '$question', '$student_ans', '".$_SESSION['login_user']."')";
$submit = $mysqli->query($sql);
}
}
?>
<form method= "post">
<?php
echo "Welcome, ";
$sql="SELECT * from lvl1itstudent WHERE username= '".$_SESSION['login_user']."'";
$find_student = mysqli_query($mysqli, $sql);
$check_student = mysqli_num_rows($find_student);
if ($check_student>0){
while($row = $find_student->fetch_assoc())
{
echo $row['username'];
}
}
echo "<br><br><br><br>";
$sql="SELECT * from lvl1itquepaper";
$run_que = mysqli_query($mysqli, $sql);
$check_que = mysqli_num_rows($run_que);
if($check_que>0){
while ($row=$run_que->fetch_assoc())
{
$questionno = $row['questionno'];
$question = $row['question'];
$option1 = $row['option1'];
$option2 = $row['option2'];
$option3 = $row['option3'];
$option4 = $row['option4'];
$ans_array = array($option1, $option2, $option3, $option4);
shuffle($ans_array);
echo "".$questionno. "." .$question."<br>";
echo "<input type='radio' name='.$questionno.' value='".$ans_array[0]."'>".$ans_array[0]."<br>";
echo "<input type='radio' name='.$questionno.' value='".$ans_array[1]."'>".$ans_array[1]."<br>";
echo "<input type='radio' name='.$questionno.' value='".$ans_array[2]."'>".$ans_array[2]."<br>";
echo "<input type='radio' name='.$questionno.' value='".$ans_array[3]."'>".$ans_array[3]."<br><br>";
}
}
else {
echo "there is no data in database";
}
?>
<input type="submit" value = "Submit" name= "Submit" style= "width:60px; height:30px";>
</form>
</div>
</body>
...或:
targetElements = target.findAll('div', attrs={'class':'row'})
如果您怀疑每一项都可能返回太多结果,您可以这样做:
targetElements = target.findAll('div', attrs={'class':'result'})
....其中soup.select('div.row.result')
是您的实例。
答案 1 :(得分:0)
以下是基于div
:
div_test='<div class=rbHeader><span role="heading" aria-level="3" class="ws_bold">Experience Level</span></div><div class=" row result" id="p_bc0437dce636c6f4" data-jk="bc0437dce636c6f4" itemscope itemtype="http://schema.org/JobPosting" data-tn-component="organicJob"></div>'
target = bs4.BeautifulSoup(div_test,'html.parser')
1,类名区分大小写,您的代码
targetElements = target.findAll('div', attrs={'class':'rbheader'})
print targetElements
将无法获得[]
。
targetElements = target.findAll('div', attrs={'class':'rbHeader'})
print targetElements
会给你:
[<div class="rbHeader"><span aria-level="3" class="ws_bold" role="heading">Experience Level</span></div>]
2,代码:
targetElements = target.findAll('div', attrs={'class':' row result'})
print targetElements
它会给你结果而不是什么:
[<div class=" row result" data-jk="bc0437dce636c6f4" data-tn-component="organicJob" id="p_bc0437dce636c6f4" itemscope="" itemtype="http://schema.org/JobPosting"></div>]