我对HTML语言不太自信,而且我在使用Python解析这部分HTML代码(print soup.prettify()的结果)时遇到了麻烦。
$("#global-flash").html("");
$('#reviews-tab-navigation').trigger('repaint');
$('#edit-review-tab').html('
<div class='\"row-fluid\"'>
\n
<div class='\"span3\"'>
\n
<div class='\"label' full-height="" id='\"review-search-result-panel\"' use-bootstrap-tables\"="">
\n
<span class='\"panel-headline\"'>
Rezensionsdaten<\/span>\n
<hr/>
\n\n
<table class='\"table' id='\"review-search-result-list\"' table-hover="" table-striped\"="">
\n
<thead>
\n
<tr>
\n
<th>
\n
<span class='\"review-count\"'>
5<\/span>\n\n Rezensionen gefunden\n <\/th>\n <\/tr>\n <\/thead>\n\n
<tbody>
\n
<tr>
\n
<td class='\"selectable-review-entry\"' data-mastertstyle-id='\"\"' data-review-id='\"10613555\"'>
\n
<span btn-link="" btn-small="" class='\"btn' review-list-link\"="">
\n 5\n
<img 2015\"="" alt='\"Bewertung' src='\"http://bp-webtools1.otto.boreus.de/tools/images/app/reviews/bewertung_stern_2015.png\"' stern=""/>
\n\n
<span aderisce="" anzi="" bene="" colore="" come="" difettucci="" e="" foto,="" i="" in="" morbidissima,="" non="" pelle,="" piacevole="" rotolini.\"="" segnare="" senza="" stringe="" sulla="" title='\"Bel'>
Bel colore come in foto, morbidissima, piacevole sulla pelle, non stringe anzi aderisce bene senza segnare i difettucci e i rotolini.<\/span>\n <\/span>\n <\/td>\n <\/tr>\n
<tr>
\n
<td class='\"selectable-review-entry\"' data-mastertstyle-id='\"\"' data-review-id='\"10610141\"'>
\n
<span btn-link="" btn-small="" class='\"btn' review-list-link\"="">
\n 5\n
<img 2015\"="" alt='\"Bewertung' src='\"http://bp-webtools1.otto.boreus.de/tools/images/app/reviews/bewertung_stern_2015.png\"' stern=""/>
\n\n
<span title='\"bella\"'>
bella<\/span>\n <\/span>\n <\/td>\n <\/tr>\n
<tr>
\n
<td class='\"selectable-review-entry\"' data-mastertstyle-id='\"\"' data-review-id='\"10575319\"'>
\n
<span btn-link="" btn-small="" class='\"btn' review-list-link\"="">
\n 4\n
<img 2015\"="" alt='\"Bewertung' src='\"http://bp-webtools1.otto.boreus.de/tools/images/app/reviews/bewertung_stern_2015.png\"' stern=""/>
\n\n
<span buona="" morbido.\"="" qualità-prezzo,="" rapporto="" tessuto="" title='\"Buon' vestibilità,="">
Buon rapporto qualità-prezzo, buona vestibilità, tessuto morbido.<\/span>\n <\/span>\n <\/td>\n <\/tr>\n
<tr>
\n
<td class='\"selectable-review-entry\"' data-mastertstyle-id='\"\"' data-review-id='\"10554514\"'>
\n
<span btn-link="" btn-small="" class='\"btn' review-list-link\"="">
\n 5\n
<img 2015\"="" alt='\"Bewertung' src='\"http://bp-webtools1.otto.boreus.de/tools/images/app/reviews/bewertung_stern_2015.png\"' stern=""/>
\n\n
<span buon="" capo!="" giusto="" ottima="" peso\"="" qualità,="" title='\"Davvero' un="">
Davvero un buon capo! Ottima qualità, giusto peso<\/span>\n <\/span>\n <\/td>\n <\/tr>\n
<tr>
\n
<td class='\"selectable-review-entry\"' data-mastertstyle-id='\"\"' data-review-id='\"9469234\"'>
\n
<span btn-link="" btn-small="" class='\"btn' review-list-link\"="">
\n 5\n
<img 2015\"="" alt='\"Bewertung' src='\"http://bp-webtools1.otto.boreus.de/tools/images/app/reviews/bewertung_stern_2015.png\"' stern=""/>
\n\n
<span ....="" altri="" anche="" bello="" colori="" e="" funzionale.="" in="" regolare.\"="" taglia="" title='\"Preso'>
Preso anche in altri colori .... bello e funzionale. Taglia regolare.<\/span>\n <\/span>\n <\/td>\n <\/tr>\n <\/tbody>\n <\/table>\n <\/div>\n <\/div>\n\n
<div class='\"span9\"'>
\n
<div class='\"row-fluid\"'>
\n
<div class='\"span3\"'>
\n
<div class='\"label' full-height\"="" id='\"product-data-panel\"'>
\n
<span class='\"panel-headline\"'>
Informazioni articolo<\/span>\n
<hr/>
\n
<a href='\"https://www.bonprix.it/search.htm?qu=95341195\"' target='\"_blank\"'>
<img src="\'http://image01.bonprix.de/bonprixbilder//assets/114x160/13050022.jpg\'"/>
<\/a>\n
<label>
N. art.<\/label>\n
<a class='\"btn-link\"' href='\"https://www.bonprix.it/search.htm?qu=95341195\"' target='\"_blank\"'>
95341195<\/a>\n
<label>
Masterstyle-ID<\/label>\n52826321\n
<label>
Digistyle-ID<\/label>\n12709620\n
<label>
Ø Media dei voti<\/label>\n4.45
<img 2015\"="" alt='\"Bewertung' src='\"http://bp-webtools1.otto.boreus.de/tools/images/app/reviews/bewertung_stern_2015.png\"' stern="">
\n
<label>
Lunghezza<\/label>\nGiusto\n
<label>
Larghezza<\/label>\nGiusto\n
<label>
Disponibilità<\/label>\n\n(37)\n\n <\/div>\n <\/div>\n\n
<div class='\"span5\"'>
\n
<div class='\"label' full-height\"="" id='\"single-review-panel\"'>
\n
<span class='\"panel-headline\"'>
Dati cliente<\/span>\n
<hr/>
\n
<table class='\"customer-info-table\"'>
\n
<tr>
\n
<td>
\n
<label>
Nome<\/label>\n nome\n <\/td>\n
<td>
\n
<label>
Cognome<\/label>\n cognome\n <\/td>\n <\/tr>\n
<tr>
\n
<td>
\n
<label>
Codice cliente<\/label>\n N/A\n <\/td>\n
<td>
\n
<label>
Indirizzo e-mail<\/label>\n ********@gmail.com\n <\/td>\n <\/tr>\n<\/table>\n\n
<span class='\"panel-headline\"'>
Commento articolo<\/span>\n
<hr/>
\n
<i class='\"rating' r5\"="">
<\/i>
<br/>
\n\n
<textarea id='\"review-text\"' name='\"text\"' readonly='\"readonly\"' rows='\"12\"'>\nBel colore come in foto, morbidissima, piacevole sulla pelle, non stringe anzi aderisce bene senza segnare i difettucci e i rotolini.<\/textarea>\n\n<span class='\"panel-headline\"'>Commenti sulla vestibilità<\/span>\n<hr/>\n<table class='\"size-info-table\"'>\n <tr>\n <td>\n <label>Lunghezza<\/label>\n Giusto\n <\/td>\n <td>\n <label>Larghezza<\/label>\n Giusto\n <\/td>\n <td>\n <label>Taglia<\/label>\n 62/64\n <\/td>\n <td>\n <label>Varianti<\/label>\n \n <\/td>\n <td>\n <label>Statura<\/label>\n 165-169\n <\/td>\n <\/tr>\n<\/table>\n<p>\n <table class='\"table\"'>\n <tr>\n <td>\n <b>Rezensions-ID:<\/b>\n <span id='\"review-id\"'>10613555<\/span>\n <\/td>\n <td>\n <b>Creata:<\/b>\n <span class='\"utc-date\"'>\n 01.10.2017 11:06:26\n <\/span>\n <\/td>\n <\/tr>\n <tr>\n <td>\n <b>Letzte Änderung<\/b>\n <span class='\"utc-date\"'>\n 01.10.2017 11:06:26\n <\/span>\n <\/td>\n <td>\n <b>di<\/b>\n Kunde\n <\/td>\n <\/tr>\n <tr>\n <td>\n <b>Data pubblicazione:<\/b>\n <span class='\"utc-date\"'>\n 01.10.2017 11:06:26\n <\/span>\n <\/td>\n <\/tr>\n <\/table>\n<\/p>\n\n <\/div>\n <\/div>\n\n <div class='\"span4\"'>\n <div class='\"label' full-height\"="" id='\"editing-functions-panel\"'>\n <span class='\"panel-headline\"'>Modifica<\/span>\n<hr/>\n<div>\n <label>Scegli un destinatario<\/label>\n <a class='\"btn-link\"' false;\"="" href='\"#\"' id='\"reset-recipients-list-link\"' onclick='\"reviews.resetRecipientsList(true);' return="">Cancella la lista destinatari<\/a>\n <select id='\"email-recipients-select\"' name='\"email-recipients-select\"'><option value='\"\"'><\/option>\n<option value='\"*****@*****.it\"'>servizio@******.it<\/option><\/select>\n <textarea id='\"email-recipients-textarea\"' name='\"email-recipients-textarea\"'>\n<\/textarea>\n <a class='\"btn\"' data-confirm-translation-modified-text='\"Die' false;\"="" gespeichert.="" href='\"#\"' id='\"send-mail-btn\"' nicht="" noch="" onclick='\"reviews.sendMail(true);' return="" rezension="" trotzdem="" versenden?\"="" wurde="" übersetzung="">Invia recensione<\/a>\n <label>Traduci<\/label>\n <textarea id='\"review-uebersetzung\"' name='\"text\"'>\n<\/textarea>\n <label>Feedback al cliente<\/label>\n <textarea id='\"review-feedbackToCustomer\"' name='\"text\"'>\n<\/textarea>\n<\/div>\n<div>\n <label>Tipo di recensione<\/label>\n <select id='\"review-meinungstyp\"' name='\"meinungstyp\"'><option selected='\"selected\"' value='\"R\"'>Recensione<\/option>\n<option value='\"G\"'>Risposte<\/option>\n<option value='\"A\"'>Archivio<\/option><\/select>\n<\/div>\n<div id='\"aktiv-checkboxes-container\"'>\n <div class='\"control-group' use-bootstrap-groups\"="">\n <label class='\"control-label\"' for='\"review_aktiv\"'>Pubblicata<\/label>\n <input id='\"review_aktiv\"' name='\"review_aktiv\"' type='\"hidden\"' value='\"T\"'/>\n <div class='\"controls\"'>\n <div class='\"btn-group\"'>\n <a btn="" btn-success\"="" class='\"change-active-state' data-value='\"T\"' href='\"#\"'>Sì<\/a>\n <a \"="" btn="" class='\"change-active-state' data-value='\"F\"' href='\"#\"'>No<\/a>\n <\/div>\n <\/div>\n <\/div>\n<\/div>\n\n<div class='\"row-fluid' form-actions="" possible-multi-line\"="">\n <a btn-primary\"="" class='\"btn' false;\"="" href='\"#\"' id='\"save-review-btn\"' onclick='\"reviews.saveReview(true);' remote='\"true\"' return="">Salva recensione<\/a>\n <a btn-danger\"="" class='\"btn' data-confirm-dialog-title='\"Cancella' false;\"="" href='\"#\"' id='\"delete-review-btn\"' onclick='\"reviews.deleteSelectedReview(true);' recensioni\"="" remote='\"true\"' return=""><i class="\'icon-trash" icon-white\'=""><\/i> Cancella recensioni<\/a>\n<\/div>\n\n <\/div>\n <\/div>\n <\/div>\n <\/div>\n<\/div>\n').trigger('repaint');
reviews.initEditReviewTab();
$('#reviews-tab-navigation').tabs('option', 'active', 0);
$('.search-tab-buttons').html('<div class='\"search-tab-buttons\"'>\n <table>\n <tr>\n <td><a btn-primary\"="" class='\"btn' false;\"="" href='\"#\"' onclick='\"reviews.submitSearchReviews();' remote='\"true\"' return="">Cerca<\/a><\/td>\n <td><a btn-default\"="" class='\"btn' false;\"="" href='\"#\"' onclick='\"reviews.setDefaultSearchParams();' remote='\"true\"' return="">Ricerca standard<\/a><\/td>\n <td><a btn-default\"="" class='\"btn' false;\"="" href='\"#\"' onclick='\"reviews.showStatistics(true);' remote='\"true\"' return="">Statistiche<\/a><\/td>\n <\/tr>\n <\/table>\n<\/div>');
$('.mini-statistics').replaceWith(' <div class='\"mini-statistics\"'>\n <p>\n Da controllare: 100 / Pubblicata: 304316 / Non pubblicata: 9207 / Prenotate: [0], mie: [0]\n <\/p>\n <\/div>\n');
</p></div></a></td></a></td></a></td></tr></table></div></i></a></a></div></a></a></div></div></label></div></div></option></option></option></select></label></div></textarea></label></textarea></label></a></textarea></option></option></select></a></label></div></span></div></div></span></b></td></tr></b></td></span></b></td></tr></span></b></td></span></b></td></tr></table></p></label></td></label></td></label></td></label></td></label></td></tr></table></span></textarea>
</i>
</span>
</label>
</td>
</label>
</td>
</tr>
</label>
</td>
</label>
</td>
</tr>
</table>
</span>
</div>
</div>
</label>
</label>
</label>
</img>
</label>
</label>
</label>
</a>
</label>
</a>
</span>
</div>
</div>
</div>
</div>
</span>
</span>
</td>
</tr>
</span>
</span>
</td>
</tr>
</span>
</span>
</td>
</tr>
</span>
</span>
</td>
</tr>
</span>
</span>
</td>
</tr>
</tbody>
</span>
</th>
</tr>
</thead>
</table>
</span>
</div>
</div>
</div>
&#13;
基本上我想在每个&#34; data-review-id&#34;之后提取数字。 (在这部分html中有5:10613555,10610141,10575319,10554514,9469234)但我不明白应该选择哪些标签来获得我想要的结果。
我已尝试过几种汤.find_all的组合,但没有任何结果。
任何帮助或建议都会非常感激。
提前致谢!
答案 0 :(得分:0)
您拥有的HTML是在一些Javascript中,并且似乎已被转义。复制/粘贴您提供的确切HTML并将其分配给html
,可以使用以下内容:
from bs4 import BeautifulSoup
html = """ ---- add HTML here ---"""
html = html.replace('"', ''). replace(r'\/', '/')
soup = BeautifulSoup(html, "html.parser")
for td in soup.find_all('td', {'data-review-id':True}):
print td['data-review-id']
然后显示:
10613555
10610141
10575319
10554514
9469234