我需要提取html代码的这些值:
2018-04-01
1,500,552
7,211
3,710
我曾经使用过find_all但我的问题是在这个HTML中我不知道如何找到元素
这是我的代码:
from bs4 import BeautifulSoup as Soup
import requests
print 'Fecha Inicio ej:2018-04-01'
start = raw_input()
print 'Fecha Fin ej:2018-04-01'
end = raw_input()
glob2 = []
urls = ['http://url.com/rtbpartners/report.php?partner=id&date_from={}&date_to={}&interval=daily'.format(start, end)]
for item in urls:
data = requests.get(item)
data = data.text
print data
soup = Soup(data, "html.parser")
print soup.find_all('tr')
HTML示例:
<!DOCTYPE html>
<html>
<head>
<link rel="icon" href="../images/favicon.ico" type="image/x-icon">
<title>AdMedia Online Ad Network | Affiliate Advertising Solutions</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<!-- Bootstrap -->
<link href="../css/admedia_styles.css" rel="stylesheet" media="screen">
<link href="../css/admedia_content_styles.css" rel="stylesheet" media="screen">
<link href="../css/chosen.css" rel="stylesheet" media="screen">
<style type="text/css">
<!--
.style1 {color: #00FF00}
-->
</style>
<link rel="stylesheet" href="http://code.jquery.com/ui/1.10.2/themes/smoothness/jquery-ui.css" />
<script src="http://code.jquery.com/jquery-1.9.1.js"></script>
<script src="http://code.jquery.com/ui/1.10.2/jquery-ui.js"></script>
<script language="javascript">
$(function() {
//restricting min date to 2012-09-25 to use the new report table
$( "#datepicker1" ).datepicker({ dateFormat: 'yy-mm-dd', minDate: (new Date(2012, 09-1, 25)) });
$( "#datepicker2" ).datepicker({ dateFormat: 'yy-mm-dd', minDate: (new Date(2012, 09-1, 25)) });
});
</script>
<!--[if IE]>
<script type="text/javascript">
document.createElement("article");
document.createElement("nav");
document.createElement("section");
document.createElement("header");
document.createElement("aside");
document.createElement("figure");
document.createElement("legend");
document.createElement("footer");
</script>
<![endif] -->
</head>
<body >
<a name="top"></a>
<header id="main-header">
<div class="container">
<a href="http://admedia.com" class="admedia-logo-wrapper"><span class="admedia-logo"><span class="admedia-logo-text">a</span><span class="admedia-logo-dot">d</span></span></a>
<!--
<ul class="top-right-links clearfix">
<li class="first"><a class="call-link" href="tel:18002967104"><span class="admedia-icon icon-phone" aria-hidden="true"></span><span class="text">Call: (800) 296-7104</span></a></li>
<li class="hidden-phone"> | </li>
<li>
<a href="/contact-us/" class="contact-link"><span class="admedia-icon icon-bubbles" aria-hidden="true"></span><span class="text">Contact Us</span></a>
<ul class="main-sub-navigation">
<li><a href="/contact-us/">Contact</a></li>
<li><a href="/contact-us/support_ticket/">Support Ticket</a></li>
<li><a href="http://help.admedia.com">Help Center</a></li>
</ul>
</li>
</ul>-->
<a id="main-navigation-dropdown-toggle" href="#">
<span class="icon-navigation" aria-hidden="true"></span>
</a>
<!--<div class=" scroll-hint scroll-hint-main"></div>-->
<a href="#" class="scroll-hint main-scroll-hint scroll-hint-main-top-arrow"><span class="scroll-hint-icon icon-chevron-sign-up" aria-hidden="true"></span></a>
<a href="#" class="scroll-hint main-scroll-hint scroll-hint-main-bottom-arrow"><span class="scroll-hint-icon icon-chevron-sign-down" aria-hidden="true"></span></a>
</div>
</header>
<div style="margin-top: 35px; margin-left: 20px;">
<h2>RTB DSP Stats</h2>
<br>
<form name="stats" method="get" action="/rtbpartners/report.php">
<input type="hidden" name="partner" value="empresa">
<input type="hidden" name="key" value="key">
<table border='0' cellpadding='15' cellspacing='10'>
<tr>
<td>Date: </td>
<td><input type="text" style="width:80px" name="date_from" id="datepicker1" value="2018-04-01"> to <input type="text" style="width:80px" name="date_to" id="datepicker2" value="2018-04-01"> </td>
</tr>
<tr>
<td>Select Interval: </td>
<td>
<select name="interval">
<option value="daily" selected>Daily</option>
<option value="hourly" >Hourly</option>
</select>
</td>
</tr>
<tr>
<td colspan="2"><input type="submit" value="Update"></td>
</tr>
</table>
</form>
<br><br>
<table width="100%" class="sortable">
<thead>
<tr>
<th style="padding-left: 5px; padding-right: 5px; border-bottom: 1px solid #CDCDCD; background-color: #A9C0C2;" align="left">
<b>Date</b>
</th>
<th style="padding-left: 5px; padding-right: 5px; border-bottom: 1px solid #CDCDCD; background-color: #A9C0C2;" align="left">
<b>Requests</b>
</th>
<th style="padding-left: 5px; padding-right: 5px; border-bottom: 1px solid #CDCDCD; background-color: #A9C0C2;" align="left">
<b>Responses</b>
</th>
<th style="padding-left: 5px; padding-right: 5px; border-bottom: 1px solid #CDCDCD; background-color: #A9C0C2;" align="left">
<b>Impressions</b>
</th>
<th style="padding-left: 5px; padding-right: 5px; border-bottom: 1px solid #CDCDCD; background-color: #A9C0C2;" align="left">
<b>Spend</b>
</th>
</tr>
</thead>
<tbody>
<tr align=center>
<td style="padding: 2px; border-bottom: 1px solid #CDCDCD; background-color: #;" align="left">
2018-04-01 </td>
<td style="padding: 2px; border-bottom: 1px solid #CDCDCD; background-color: #;" align="left">
1,500,552 </td>
<td style="padding: 2px; border-bottom: 1px solid #CDCDCD; background-color: #;" align="left">
7,211 </td>
<td style="padding: 2px; border-bottom: 1px solid #CDCDCD; background-color: #;" align="left">
3,710 </td>
<td style="padding: 2px; border-bottom: 1px solid #CDCDCD; background-color: #;" align="left">
1.43 </td>
</tr>
</tbody>
<tfoot>
<tr>
<td style="padding-left: 5px; padding-right: 20px; border-bottom: 1px solid #CDCDCD; background-color: #A9C0C2;"
align="right">
<b>Total:</b>
</td>
<td style="padding-left: 5px; padding-right: 5px; border-bottom: 1px solid #CDCDCD; background-color: #A9C0C2;"
align="left">
<b>1,500,552</b>
</td>
<td style="padding-left: 5px; padding-right: 5px; border-bottom: 1px solid #CDCDCD; background-color: #A9C0C2;"
align="left">
<b>7,211</b>
</td>
<td style="padding-left: 5px; padding-right: 5px; border-bottom: 1px solid #CDCDCD; background-color: #A9C0C2;"
align="left">
<b>3,710</b>
</td>
<td style="padding-left: 5px; padding-right: 5px; border-bottom: 1px solid #CDCDCD; background-color: #A9C0C2;"
align="left">
<b>1.43</b>
</td>
</tr>
</tfoot>
</table>
</div>
</body>
</html>
答案 0 :(得分:0)
您永远不会在for循环之前检索任何元素,因此循环无需查找。我建议在for循环之前放置你的“find_all()”然后执行它。然后添加更多for循环来遍历所有标记以找到要查找的特定标记。 包括一些if循环,如
if tag.name == "td":
(code here)
我还建议您查看lxml用于使用xpath在网页上查找特定项目。
答案 1 :(得分:0)
这样的事情应该有效:
soup = bs4.BeautifulSoup(content, 'lxml')
for table_row in soup.find_all(name="tr"):
if table_row.parent.name == "tbody":
for content in table_row.find_all("td"):
print(content.getText().strip())
它在Python 3中使用了BeautifoulSoup4,但是你在使用Python2时没有任何困难。
结果:
2018-04-01
1,500,552
7,211
3,710
1.43