我是Java的新手,并且一直试图抓住一些网站。
在其他网站上,我可以使用“id = xxx”值搜索元素。不幸的是,我目前正在练习的网站不使用ID。
如何使用HTMLUnit解析表格以挑选出我想要的表格?
最后,我粘贴了网站上的html代码的相关位。我想废弃的项目列在表格第一行的下方,例如我想提取 36 ,等级 D3 ,贷款 - 金额 $ 6,800 等。我会对所有行做同样的事情。
<td class="loan-id" ng-click="row.toggleDetails()"><a href=""> <!-- ngIf: loan.hasProtect -->
<!-- ngIf: loan.isUppingLimits --> <!-- ngIf: loan.isBuyingDeeper --> <span
ng-bind-html="loan.name | softHyphens" class="ng-binding">LAI00100138</span> </a></td>
<td class="grade-and-rate">
<div><span class="grade grade-d" ng-class="row.getGradeClass()"></span> <span
class="grade-letter ng-binding">D3</span> <span class="rate ng-binding">27.12%</span></div>
</td>
<td class="term ng-binding">36</td>
<td class="default-rate ng-binding">1.95%</td>
<td class="loan-amount ng-binding">$6,800</td>
<td class="purpose ng-binding">Other</td>
所有帮助感谢。
(相关)来自以下网站的HTML代码
<table>
<colgroup>
<col class="loan-id">
<col class="grade-and-rate">
<col class="term">
<col class="default-rate">
<col class="loan-amount">
<col class="purpose">
<col class="amount-funded">
<col class="time-remaining">
<col class="order-notes">
<col class="invested-amount">
<col class="transaction-fee">
</colgroup>
<thead>
<tr class="grouping">
<td colspan="8" class="loans-group wide"> <!-- featureFlag: HYBRID_MARKETPLACE_ACCESS is off --> LOANS</td>
<td colspan="5" class="loans-group narrow"> LOANS</td>
<td colspan="3" class="order-group">ORDER</td>
</tr>
<tr class="specific-headings">
<td class="loan-id" ng-class="{ 'sort-active': vm.isActiveSort('name') }"><a href=""
ng-click="vm.updateSortKey('name')">
LOAN ID </a></td>
<td class="grade-and-rate" ng-class="{ 'sort-active': vm.isActiveSort('grade') }"><a href=""
ng-click="vm.updateSortKey('grade')">
Grade & rate* </a></td>
<td class="term" ng-class="{ 'sort-active': vm.isActiveSort('term') }"><a href=""
ng-click="vm.updateSortKey('term')">
<abbr title="Maximum term" translate="MAX_TERM_SHORT" class="ng-scope ng-binding">Max. term</abbr><br>
<aside>(months)</aside>
</a></td>
<td class="default-rate" ng-class="{ 'sort-active': vm.isActiveSort('expectedDefaultRate') }"><a href=""
ng-click="vm.updateSortKey('expectedDefaultRate')">
<abbr title="Estimated annual default rate" translate="ESTIMATED_DEFAULT_RATE_SHORT" class="ng-scope">Est.
default rate</abbr>** </a></td>
<td class="loan-amount" ng-class="{ 'sort-active': vm.isActiveSort('amount') }"><a href=""
ng-click="vm.updateSortKey('amount')">
Loan amount </a></td>
<td class="purpose" ng-class="{ 'sort-active': vm.isActiveSort('loanPurpose') }"><a href=""
ng-click="vm.updateSortKey('loanPurpose')">
Purpose </a></td>
<td class="amount-funded" ng-class="{ 'sort-active': vm.isActiveSort('percentageFunded') }"><a href=""
ng-click="vm.updateSortKey('percentageFunded')">
Amount funded </a></td>
<td class="time-remaining sort-active" ng-class="{ 'sort-active': vm.isActiveSort('expiryDate') }"><a href=""
ng-click="vm.updateSortKey('expiryDate')">
Time remaining </a></td>
<td class="order-notes"> $25 multiples</td>
<td class="order-amount-fee" colspan="2"> Amount <br> <!-- featureFlag: TRANSACTION_FEE is off -->
<aside feature-flag="TRANSACTION_FEE" feature-flag-hide="" class="ng-scope">(value / percentage)</aside>
</td>
</tr>
</thead> <!-- ngRepeat: loan in vm.data | page: vm.limit | unique: 'id' track by loan.id -->
<tbody ng-repeat="loan in vm.data | page: vm.limit | unique: 'id' track by loan.id"
ng-controller="BrowseMarketplaceRowController as row" ng-init="row.setLoan(loan); row.setAccount(vm.account)"
class="ng-scope" style="">
<tr>
<td class="loan-id" ng-click="row.toggleDetails()"><a href=""> <!-- ngIf: loan.hasProtect -->
<!-- ngIf: loan.isUppingLimits --> <!-- ngIf: loan.isBuyingDeeper --> <span
ng-bind-html="loan.name | softHyphens" class="ng-binding">LAI00100138</span> </a></td>
<td class="grade-and-rate">
<div><span class="grade grade-d" ng-class="row.getGradeClass()"></span> <span
class="grade-letter ng-binding">D3</span> <span class="rate ng-binding">27.12%</span></div>
</td>
<td class="term ng-binding">36</td>
<td class="default-rate ng-binding">1.95%</td>
<td class="loan-amount ng-binding">$6,800</td>
<td class="purpose ng-binding">Other</td>
<td class="amount-funded">
<hmy-progress-bar progress="33%" class="ng-isolate-scope"><span class="bar" style="width: 33%;"></span>
<span class="text ng-binding">33%</span></hmy-progress-bar>
<aside class="ng-binding"> 181 notes remaining</aside>
</td>
<td class="time-remaining ng-binding">14 days</td>
<td class="order-notes ng-pristine ng-valid ng-untouched ng-valid-min ng-valid-pattern"
ng-class="{ focused: row.isNotesFieldFocused, 'whole-loan': row.loan.isFractionalised === false }"
ng-form="orderNotesField" hmy-pass-click-to-input=""> <!-- ngIf: row.loan.isFractionalised !== false -->
<div class="position ng-binding ng-scope" ng-if="row.loan.isFractionalised !== false">
<!-- ngIf: orderNotesField.$error.min -->
<!-- ngIf: row.isUnderDiversified(row.orderCountTotal()) && !orderNotesField.$error.min -->
<!-- ngIf: row.loan.alreadyInvestedAmount > 0 --> <input name="order" tabindex="1"
ng-pattern="/^[0-9]*$/"
ng-model="row.orderCount"
ng-model-options="{ getterSetter: true, allowInvalid: true }"
ng-focus="row.isNotesFieldFocused = true"
ng-blur="row.isNotesFieldFocused = false"
ng-min="1"
class="ng-pristine ng-untouched ng-valid ng-empty ng-valid-min ng-valid-pattern"
type="number"></div>
<!-- end ngIf: row.loan.isFractionalised !== false --> <!-- ngIf: row.loan.isFractionalised === false -->
</td>
<td class="invested-amount ng-binding"></td> <!-- featureFlag: TRANSACTION_FEE is off -->
<td class="transaction-fee ng-binding ng-scope" feature-flag="TRANSACTION_FEE" feature-flag-hide=""></td>
</tr>
<tr> <!-- ngIf: row.isOpen --> </tr>
</tbody><!-- end ngRepeat: loan in vm.data | page: vm.limit | unique: 'id' track by loan.id -->
<tbody ng-repeat="loan in vm.data | page: vm.limit | unique: 'id' track by loan.id"
ng-controller="BrowseMarketplaceRowController as row" ng-init="row.setLoan(loan); row.setAccount(vm.account)"
class="ng-scope" style="">
<tr>
<td class="loan-id" ng-click="row.toggleDetails()"><a href=""> <!-- ngIf: loan.hasProtect -->
<!-- ngIf: loan.isUppingLimits --> <!-- ngIf: loan.isBuyingDeeper --> <span
ng-bind-html="loan.name | softHyphens" class="ng-binding">LAI00100140</span> </a></td>
<td class="grade-and-rate">
<div><span class="grade grade-b" ng-class="row.getGradeClass()"></span> <span
class="grade-letter ng-binding">B3</span> <span class="rate ng-binding">15.16%</span></div>
</td>
<td class="term ng-binding">60</td>
<td class="default-rate ng-binding">0.54%</td>
<td class="loan-amount ng-binding">$7,275</td>
<td class="purpose ng-binding">Other</td>
<td class="amount-funded">
<hmy-progress-bar progress="23%" class="ng-isolate-scope"><span class="bar" style="width: 23%;"></span>
<span class="text ng-binding">23%</span></hmy-progress-bar>
<aside class="ng-binding"> 223 notes remaining</aside>
</td>
<td class="time-remaining ng-binding">14 days</td>
<td class="order-notes ng-pristine ng-valid ng-untouched ng-valid-min ng-valid-pattern"
ng-class="{ focused: row.isNotesFieldFocused, 'whole-loan': row.loan.isFractionalised === false }"
ng-form="orderNotesField" hmy-pass-click-to-input=""> <!-- ngIf: row.loan.isFractionalised !== false -->
<div class="position ng-binding ng-scope" ng-if="row.loan.isFractionalised !== false">
<!-- ngIf: orderNotesField.$error.min -->
<!-- ngIf: row.isUnderDiversified(row.orderCountTotal()) && !orderNotesField.$error.min -->
<!-- ngIf: row.loan.alreadyInvestedAmount > 0 --> <input name="order" tabindex="2"
ng-pattern="/^[0-9]*$/"
ng-model="row.orderCount"
ng-model-options="{ getterSetter: true, allowInvalid: true }"
ng-focus="row.isNotesFieldFocused = true"
ng-blur="row.isNotesFieldFocused = false"
ng-min="1"
class="ng-pristine ng-untouched ng-valid ng-empty ng-valid-min ng-valid-pattern"
type="number"></div>
<!-- end ngIf: row.loan.isFractionalised !== false --> <!-- ngIf: row.loan.isFractionalised === false -->
</td>
<td class="invested-amount ng-binding"></td> <!-- featureFlag: TRANSACTION_FEE is off -->
<td class="transaction-fee ng-binding ng-scope" feature-flag="TRANSACTION_FEE" feature-flag-hide=""></td>
</tr>
<tr> <!-- ngIf: row.isOpen --> </tr>
</tbody><!-- end ngRepeat: loan in vm.data | page: vm.limit | unique: 'id' track by loan.id -->
<tbody ng-repeat="loan in vm.data | page: vm.limit | unique: 'id' track by loan.id"
ng-controller="BrowseMarketplaceRowController as row" ng-init="row.setLoan(loan); row.setAccount(vm.account)"
class="ng-scope" style="">
<tr>
<td class="loan-id" ng-click="row.toggleDetails()"><a href=""> <!-- ngIf: loan.hasProtect -->
<!-- ngIf: loan.isUppingLimits --> <!-- ngIf: loan.isBuyingDeeper --> <span
ng-bind-html="loan.name | softHyphens" class="ng-binding">LAI00100139</span> </a></td>
<td class="grade-and-rate">
<div><span class="grade grade-e" ng-class="row.getGradeClass()"></span> <span
class="grade-letter ng-binding">E2</span> <span class="rate ng-binding">33.95%</span></div>
</td>
<td class="term ng-binding">60</td>
<td class="default-rate ng-binding">3.52%</td>
<td class="loan-amount ng-binding">$7,075</td>
<td class="purpose ng-binding">Debt Consolidation</td>
<td class="amount-funded">
<hmy-progress-bar progress="29%" class="ng-isolate-scope"><span class="bar" style="width: 29%;"></span>
<span class="text ng-binding">29%</span></hmy-progress-bar>
<aside class="ng-binding"> 200 notes remaining</aside>
</td>
<td class="time-remaining ng-binding">14 days</td>
<td class="order-notes ng-pristine ng-valid ng-untouched ng-valid-min ng-valid-pattern"
ng-class="{ focused: row.isNotesFieldFocused, 'whole-loan': row.loan.isFractionalised === false }"
ng-form="orderNotesField" hmy-pass-click-to-input=""> <!-- ngIf: row.loan.isFractionalised !== false -->
<div class="position ng-binding ng-scope" ng-if="row.loan.isFractionalised !== false">
<!-- ngIf: orderNotesField.$error.min -->
<!-- ngIf: row.isUnderDiversified(row.orderCountTotal()) && !orderNotesField.$error.min -->
<!-- ngIf: row.loan.alreadyInvestedAmount > 0 --> <input name="order" tabindex="3"
ng-pattern="/^[0-9]*$/"
ng-model="row.orderCount"
ng-model-options="{ getterSetter: true, allowInvalid: true }"
ng-focus="row.isNotesFieldFocused = true"
ng-blur="row.isNotesFieldFocused = false"
ng-min="1"
class="ng-pristine ng-untouched ng-valid ng-empty ng-valid-min ng-valid-pattern"
type="number"></div>
<!-- end ngIf: row.loan.isFractionalised !== false --> <!-- ngIf: row.loan.isFractionalised === false -->
</td>
<td class="invested-amount ng-binding"></td> <!-- featureFlag: TRANSACTION_FEE is off -->
<td class="transaction-fee ng-binding ng-scope" feature-flag="TRANSACTION_FEE" feature-flag-hide=""></td>
</tr>
<tr> <!-- ngIf: row.isOpen --> </tr>
</tbody><!-- end ngRepeat: loan in vm.data | page: vm.limit | unique: 'id' track by loan.id -->
<tbody ng-repeat="loan in vm.data | page: vm.limit | unique: 'id' track by loan.id"
ng-controller="BrowseMarketplaceRowController as row" ng-init="row.setLoan(loan); row.setAccount(vm.account)"
class="ng-scope" style="">
<tr>
<td class="loan-id" ng-click="row.toggleDetails()"><a href=""> <!-- ngIf: loan.hasProtect -->
<!-- ngIf: loan.isUppingLimits --> <!-- ngIf: loan.isBuyingDeeper --> <span
ng-bind-html="loan.name | softHyphens" class="ng-binding">LAI00100142</span> </a></td>
<td class="grade-and-rate">
<div><span class="grade grade-f" ng-class="row.getGradeClass()"></span> <span
class="grade-letter ng-binding">F3</span> <span class="rate ng-binding">39.61%</span></div>
</td>
<td class="term ng-binding">60</td>
<td class="default-rate ng-binding">9.79%</td>
<td class="loan-amount ng-binding">$5,500</td>
<td class="purpose ng-binding">Holiday Expenses</td>
<td class="amount-funded">
<hmy-progress-bar progress="1%" class="ng-isolate-scope"><span class="bar" style="width: 1%;"></span>
<span class="text ng-binding">1%</span></hmy-progress-bar>
<aside class="ng-binding"> 217 notes remaining</aside>
</td>
<td class="time-remaining ng-binding">14 days</td>
<td class="order-notes ng-pristine ng-valid ng-untouched ng-valid-min ng-valid-pattern"
ng-class="{ focused: row.isNotesFieldFocused, 'whole-loan': row.loan.isFractionalised === false }"
ng-form="orderNotesField" hmy-pass-click-to-input=""> <!-- ngIf: row.loan.isFractionalised !== false -->
<div class="position ng-binding ng-scope" ng-if="row.loan.isFractionalised !== false">
<!-- ngIf: orderNotesField.$error.min -->
<!-- ngIf: row.isUnderDiversified(row.orderCountTotal()) && !orderNotesField.$error.min -->
<!-- ngIf: row.loan.alreadyInvestedAmount > 0 --> <input name="order" tabindex="4"
ng-pattern="/^[0-9]*$/"
ng-model="row.orderCount"
ng-model-options="{ getterSetter: true, allowInvalid: true }"
ng-focus="row.isNotesFieldFocused = true"
ng-blur="row.isNotesFieldFocused = false"
ng-min="1"
class="ng-pristine ng-untouched ng-valid ng-empty ng-valid-min ng-valid-pattern"
type="number"></div>
<!-- end ngIf: row.loan.isFractionalised !== false --> <!-- ngIf: row.loan.isFractionalised === false -->
</td>
<td class="invested-amount ng-binding"></td> <!-- featureFlag: TRANSACTION_FEE is off -->
<td class="transaction-fee ng-binding ng-scope" feature-flag="TRANSACTION_FEE" feature-flag-hide=""></td>
</tr>
<tr> <!-- ngIf: row.isOpen --> </tr>
</tbody><!-- end ngRepeat: loan in vm.data | page: vm.limit | unique: 'id' track by loan.id -->
<tbody ng-repeat="loan in vm.data | page: vm.limit | unique: 'id' track by loan.id"
ng-controller="BrowseMarketplaceRowController as row" ng-init="row.setLoan(loan); row.setAccount(vm.account)"
class="ng-scope" style="">
<tr>
<td class="loan-id" ng-click="row.toggleDetails()"><a href=""> <!-- ngIf: loan.hasProtect -->
<!-- ngIf: loan.isUppingLimits --> <!-- ngIf: loan.isBuyingDeeper --> <span
ng-bind-html="loan.name | softHyphens" class="ng-binding">LAI00100133</span> </a></td>
<td class="grade-and-rate">
<div><span class="grade grade-a" ng-class="row.getGradeClass()"></span> <span
class="grade-letter ng-binding">A5</span> <span class="rate ng-binding">13.25%</span></div>
</td>
<td class="term ng-binding">60</td>
<td class="default-rate ng-binding">0.27%</td>
<td class="loan-amount ng-binding">$35,575</td>
<td class="purpose ng-binding">Debt Consolidation</td>
<td class="amount-funded">
<hmy-progress-bar progress="1%" class="ng-isolate-scope"><span class="bar" style="width: 1%;"></span>
<span class="text ng-binding">1%</span></hmy-progress-bar>
<aside class="ng-binding"> 1,406 notes remaining</aside>
</td>
<td class="time-remaining ng-binding">14 days</td>
<td class="order-notes ng-pristine ng-valid ng-untouched ng-valid-min ng-valid-pattern"
ng-class="{ focused: row.isNotesFieldFocused, 'whole-loan': row.loan.isFractionalised === false }"
ng-form="orderNotesField" hmy-pass-click-to-input=""> <!-- ngIf: row.loan.isFractionalised !== false -->
<div class="position ng-binding ng-scope" ng-if="row.loan.isFractionalised !== false">
<!-- ngIf: orderNotesField.$error.min -->
<!-- ngIf: row.isUnderDiversified(row.orderCountTotal()) && !orderNotesField.$error.min -->
<!-- ngIf: row.loan.alreadyInvestedAmount > 0 --> <input name="order" tabindex="5"
ng-pattern="/^[0-9]*$/"
ng-model="row.orderCount"
ng-model-options="{ getterSetter: true, allowInvalid: true }"
ng-focus="row.isNotesFieldFocused = true"
ng-blur="row.isNotesFieldFocused = false"
ng-min="1"
class="ng-pristine ng-untouched ng-valid ng-empty ng-valid-min ng-valid-pattern"
type="number"></div>
<!-- end ngIf: row.loan.isFractionalised !== false --> <!-- ngIf: row.loan.isFractionalised === false -->
</td>
<td class="invested-amount ng-binding"></td> <!-- featureFlag: TRANSACTION_FEE is off -->
<td class="transaction-fee ng-binding ng-scope" feature-flag="TRANSACTION_FEE" feature-flag-hide=""></td>
</tr>
<tr> <!-- ngIf: row.isOpen --> </tr>
</tbody><!-- end ngRepeat: loan in vm.data | page: vm.limit | unique: 'id' track by loan.id -->
<tbody> <!-- ngIf: vm.areNoLoans() --> <!-- ngIf: vm.wasLoadingError --> <!-- ngIf: vm.isCurrentPageLoading() -->
<!-- ngIf: vm.isOnEmptyPage() --> <!-- ngIf: vm.areAllLoansFilteredOut() --> </tbody>
</table>
编辑1: 我已将页面中的完整html粘贴到pastebin中:https://pastebin.com/TA3EqECQ 我觉得有趣的一件事是,如果我右键单击页面并说show source我得到一个小得多的页面,但它似乎包含javascript,我猜测填充页面更充分。粘贴到pastebin中的html来自我右键单击页面(在firefox中)并选择“Inspect Element”
编辑2: @Jobin
// Try logging in
try {
// get a new web client
m_webClient = getNewWebClient();
// Go to the login page
final HtmlPage loginPage = m_webClient.getPage(m_settings.getLoginPage());
// Get the form that we are dealing with and within that form,
// find the submit button and the login fields that we need to enter.
final HtmlForm form = loginPage.getFormByName(m_settings.getLoginForm());
final HtmlSubmitInput button = form.getInputByName(m_settings.getLoginFormButton());
final HtmlTextInput emailField = form.getInputByName(m_settings.getEmailField());
final HtmlPasswordInput passwordField = form.getInputByName(m_settings.getPasswordField());
// Enter the login values
emailField.setValueAttribute(m_username);
passwordField.setValueAttribute(m_password);
// Now submit the form by clicking the button and get back the second page. This second page is NOT the loans page, we need
// to navigate to that seperately
// TODO check the login was successful
postLoginPage = button.click();
} catch (Exception e) {
// TODO post error to console
m_webClient.close();
m_webClient = null;
return null;
}
HtmlPage loansPage = null;
try {
// Go to the loans page
loansPage = m_webClient.getPage(m_settings.getBrowseLoansPage());
// TODO Parse the loans here
} catch (Exception e) {
// TODO post error to console
}
答案 0 :(得分:0)
有很多不同的选择,例如XPath,CSS选择器或只是使用java访问DOM。 HtmlUnit主页有一些示例(主题入门)