HTMLUnit:如何解析没有id属性

时间:2017-06-20 04:03:11

标签: java htmlunit

我是Java的新手,并且一直试图抓住一些网站。

在其他网站上,我可以使用“id = xxx”值搜索元素。不幸的是,我目前正在练习的网站不使用ID。

如何使用HTMLUnit解析表格以挑选出我想要的表格?

最后,我粘贴了网站上的html代码的相关位。我想废弃的项目列在表格第一行的下方,例如我想提取 36 ,等级 D3 ,贷款 - 金额 $ 6,800 等。我会对所有行做同样的事情。

<td class="loan-id" ng-click="row.toggleDetails()"><a href=""> <!-- ngIf: loan.hasProtect -->
                <!-- ngIf: loan.isUppingLimits --> <!-- ngIf: loan.isBuyingDeeper --> <span
                        ng-bind-html="loan.name | softHyphens" class="ng-binding">LAI­00100138</span> </a></td>
            <td class="grade-and-rate">
                <div><span class="grade grade-d" ng-class="row.getGradeClass()"></span> <span
                        class="grade-letter ng-binding">D3</span> <span class="rate ng-binding">27.12%</span></div>
            </td>
            <td class="term ng-binding">36</td>
            <td class="default-rate ng-binding">1.95%</td>
            <td class="loan-amount ng-binding">$6,800</td>
            <td class="purpose ng-binding">Other</td>

所有帮助感谢。

(相关)来自以下网站的HTML代码

<table>
    <colgroup>
        <col class="loan-id">
        <col class="grade-and-rate">
        <col class="term">
        <col class="default-rate">
        <col class="loan-amount">
        <col class="purpose">
        <col class="amount-funded">
        <col class="time-remaining">
        <col class="order-notes">
        <col class="invested-amount">
        <col class="transaction-fee">
    </colgroup>
    <thead>
    <tr class="grouping">
        <td colspan="8" class="loans-group wide"> <!-- featureFlag: HYBRID_MARKETPLACE_ACCESS is off --> LOANS</td>
        <td colspan="5" class="loans-group narrow"> LOANS</td>
        <td colspan="3" class="order-group">ORDER</td>
    </tr>
    <tr class="specific-headings">
        <td class="loan-id" ng-class="{ 'sort-active': vm.isActiveSort('name') }"><a href=""
                                                                                     ng-click="vm.updateSortKey('name')">
            LOAN ID </a></td>
        <td class="grade-and-rate" ng-class="{ 'sort-active': vm.isActiveSort('grade') }"><a href=""
                                                                                             ng-click="vm.updateSortKey('grade')">
            Grade &amp; rate* </a></td>
        <td class="term" ng-class="{ 'sort-active': vm.isActiveSort('term') }"><a href=""
                                                                                  ng-click="vm.updateSortKey('term')">
            <abbr title="Maximum term" translate="MAX_TERM_SHORT" class="ng-scope ng-binding">Max. term</abbr><br>
            <aside>(months)</aside>
        </a></td>
        <td class="default-rate" ng-class="{ 'sort-active': vm.isActiveSort('expectedDefaultRate') }"><a href=""
                                                                                                         ng-click="vm.updateSortKey('expectedDefaultRate')">
            <abbr title="Estimated annual default rate" translate="ESTIMATED_DEFAULT_RATE_SHORT" class="ng-scope">Est.
                default rate</abbr>** </a></td>
        <td class="loan-amount" ng-class="{ 'sort-active': vm.isActiveSort('amount') }"><a href=""
                                                                                           ng-click="vm.updateSortKey('amount')">
            Loan amount </a></td>
        <td class="purpose" ng-class="{ 'sort-active': vm.isActiveSort('loanPurpose') }"><a href=""
                                                                                            ng-click="vm.updateSortKey('loanPurpose')">
            Purpose </a></td>
        <td class="amount-funded" ng-class="{ 'sort-active': vm.isActiveSort('percentageFunded') }"><a href=""
                                                                                                       ng-click="vm.updateSortKey('percentageFunded')">
            Amount funded </a></td>
        <td class="time-remaining sort-active" ng-class="{ 'sort-active': vm.isActiveSort('expiryDate') }"><a href=""
                                                                                                              ng-click="vm.updateSortKey('expiryDate')">
            Time remaining </a></td>
        <td class="order-notes"> $25 multiples</td>
        <td class="order-amount-fee" colspan="2"> Amount <br> <!-- featureFlag: TRANSACTION_FEE is off -->
            <aside feature-flag="TRANSACTION_FEE" feature-flag-hide="" class="ng-scope">(value / percentage)</aside>
        </td>
    </tr>
    </thead> <!-- ngRepeat: loan in vm.data | page: vm.limit | unique: 'id' track by loan.id -->
    <tbody ng-repeat="loan in vm.data | page: vm.limit | unique: 'id' track by loan.id"
           ng-controller="BrowseMarketplaceRowController as row" ng-init="row.setLoan(loan); row.setAccount(vm.account)"
           class="ng-scope" style="">
    <tr>
        <td class="loan-id" ng-click="row.toggleDetails()"><a href=""> <!-- ngIf: loan.hasProtect -->
            <!-- ngIf: loan.isUppingLimits --> <!-- ngIf: loan.isBuyingDeeper --> <span
                    ng-bind-html="loan.name | softHyphens" class="ng-binding">LAI­00100138</span> </a></td>
        <td class="grade-and-rate">
            <div><span class="grade grade-d" ng-class="row.getGradeClass()"></span> <span
                    class="grade-letter ng-binding">D3</span> <span class="rate ng-binding">27.12%</span></div>
        </td>
        <td class="term ng-binding">36</td>
        <td class="default-rate ng-binding">1.95%</td>
        <td class="loan-amount ng-binding">$6,800</td>
        <td class="purpose ng-binding">Other</td>
        <td class="amount-funded">
            <hmy-progress-bar progress="33%" class="ng-isolate-scope"><span class="bar" style="width: 33%;"></span>
                <span class="text ng-binding">33%</span></hmy-progress-bar>
            <aside class="ng-binding"> 181 notes remaining</aside>
        </td>
        <td class="time-remaining ng-binding">14 days</td>
        <td class="order-notes ng-pristine ng-valid ng-untouched ng-valid-min ng-valid-pattern"
            ng-class="{ focused: row.isNotesFieldFocused, 'whole-loan': row.loan.isFractionalised === false }"
            ng-form="orderNotesField" hmy-pass-click-to-input=""> <!-- ngIf: row.loan.isFractionalised !== false -->
            <div class="position ng-binding ng-scope" ng-if="row.loan.isFractionalised !== false">
                <!-- ngIf: orderNotesField.$error.min -->
                <!-- ngIf: row.isUnderDiversified(row.orderCountTotal()) && !orderNotesField.$error.min -->
                <!-- ngIf: row.loan.alreadyInvestedAmount > 0 --> <input name="order" tabindex="1"
                                                                         ng-pattern="/^[0-9]*$/"
                                                                         ng-model="row.orderCount"
                                                                         ng-model-options="{ getterSetter: true, allowInvalid: true }"
                                                                         ng-focus="row.isNotesFieldFocused = true"
                                                                         ng-blur="row.isNotesFieldFocused = false"
                                                                         ng-min="1"
                                                                         class="ng-pristine ng-untouched ng-valid ng-empty ng-valid-min ng-valid-pattern"
                                                                         type="number"></div>
            <!-- end ngIf: row.loan.isFractionalised !== false --> <!-- ngIf: row.loan.isFractionalised === false -->
        </td>
        <td class="invested-amount ng-binding"></td> <!-- featureFlag: TRANSACTION_FEE is off -->
        <td class="transaction-fee ng-binding ng-scope" feature-flag="TRANSACTION_FEE" feature-flag-hide=""></td>
    </tr>
    <tr> <!-- ngIf: row.isOpen --> </tr>
    </tbody><!-- end ngRepeat: loan in vm.data | page: vm.limit | unique: 'id' track by loan.id -->
    <tbody ng-repeat="loan in vm.data | page: vm.limit | unique: 'id' track by loan.id"
           ng-controller="BrowseMarketplaceRowController as row" ng-init="row.setLoan(loan); row.setAccount(vm.account)"
           class="ng-scope" style="">
    <tr>
        <td class="loan-id" ng-click="row.toggleDetails()"><a href=""> <!-- ngIf: loan.hasProtect -->
            <!-- ngIf: loan.isUppingLimits --> <!-- ngIf: loan.isBuyingDeeper --> <span
                    ng-bind-html="loan.name | softHyphens" class="ng-binding">LAI­00100140</span> </a></td>
        <td class="grade-and-rate">
            <div><span class="grade grade-b" ng-class="row.getGradeClass()"></span> <span
                    class="grade-letter ng-binding">B3</span> <span class="rate ng-binding">15.16%</span></div>
        </td>
        <td class="term ng-binding">60</td>
        <td class="default-rate ng-binding">0.54%</td>
        <td class="loan-amount ng-binding">$7,275</td>
        <td class="purpose ng-binding">Other</td>
        <td class="amount-funded">
            <hmy-progress-bar progress="23%" class="ng-isolate-scope"><span class="bar" style="width: 23%;"></span>
                <span class="text ng-binding">23%</span></hmy-progress-bar>
            <aside class="ng-binding"> 223 notes remaining</aside>
        </td>
        <td class="time-remaining ng-binding">14 days</td>
        <td class="order-notes ng-pristine ng-valid ng-untouched ng-valid-min ng-valid-pattern"
            ng-class="{ focused: row.isNotesFieldFocused, 'whole-loan': row.loan.isFractionalised === false }"
            ng-form="orderNotesField" hmy-pass-click-to-input=""> <!-- ngIf: row.loan.isFractionalised !== false -->
            <div class="position ng-binding ng-scope" ng-if="row.loan.isFractionalised !== false">
                <!-- ngIf: orderNotesField.$error.min -->
                <!-- ngIf: row.isUnderDiversified(row.orderCountTotal()) && !orderNotesField.$error.min -->
                <!-- ngIf: row.loan.alreadyInvestedAmount > 0 --> <input name="order" tabindex="2"
                                                                         ng-pattern="/^[0-9]*$/"
                                                                         ng-model="row.orderCount"
                                                                         ng-model-options="{ getterSetter: true, allowInvalid: true }"
                                                                         ng-focus="row.isNotesFieldFocused = true"
                                                                         ng-blur="row.isNotesFieldFocused = false"
                                                                         ng-min="1"
                                                                         class="ng-pristine ng-untouched ng-valid ng-empty ng-valid-min ng-valid-pattern"
                                                                         type="number"></div>
            <!-- end ngIf: row.loan.isFractionalised !== false --> <!-- ngIf: row.loan.isFractionalised === false -->
        </td>
        <td class="invested-amount ng-binding"></td> <!-- featureFlag: TRANSACTION_FEE is off -->
        <td class="transaction-fee ng-binding ng-scope" feature-flag="TRANSACTION_FEE" feature-flag-hide=""></td>
    </tr>
    <tr> <!-- ngIf: row.isOpen --> </tr>
    </tbody><!-- end ngRepeat: loan in vm.data | page: vm.limit | unique: 'id' track by loan.id -->
    <tbody ng-repeat="loan in vm.data | page: vm.limit | unique: 'id' track by loan.id"
           ng-controller="BrowseMarketplaceRowController as row" ng-init="row.setLoan(loan); row.setAccount(vm.account)"
           class="ng-scope" style="">
    <tr>
        <td class="loan-id" ng-click="row.toggleDetails()"><a href=""> <!-- ngIf: loan.hasProtect -->
            <!-- ngIf: loan.isUppingLimits --> <!-- ngIf: loan.isBuyingDeeper --> <span
                    ng-bind-html="loan.name | softHyphens" class="ng-binding">LAI­00100139</span> </a></td>
        <td class="grade-and-rate">
            <div><span class="grade grade-e" ng-class="row.getGradeClass()"></span> <span
                    class="grade-letter ng-binding">E2</span> <span class="rate ng-binding">33.95%</span></div>
        </td>
        <td class="term ng-binding">60</td>
        <td class="default-rate ng-binding">3.52%</td>
        <td class="loan-amount ng-binding">$7,075</td>
        <td class="purpose ng-binding">Debt Consolidation</td>
        <td class="amount-funded">
            <hmy-progress-bar progress="29%" class="ng-isolate-scope"><span class="bar" style="width: 29%;"></span>
                <span class="text ng-binding">29%</span></hmy-progress-bar>
            <aside class="ng-binding"> 200 notes remaining</aside>
        </td>
        <td class="time-remaining ng-binding">14 days</td>
        <td class="order-notes ng-pristine ng-valid ng-untouched ng-valid-min ng-valid-pattern"
            ng-class="{ focused: row.isNotesFieldFocused, 'whole-loan': row.loan.isFractionalised === false }"
            ng-form="orderNotesField" hmy-pass-click-to-input=""> <!-- ngIf: row.loan.isFractionalised !== false -->
            <div class="position ng-binding ng-scope" ng-if="row.loan.isFractionalised !== false">
                <!-- ngIf: orderNotesField.$error.min -->
                <!-- ngIf: row.isUnderDiversified(row.orderCountTotal()) && !orderNotesField.$error.min -->
                <!-- ngIf: row.loan.alreadyInvestedAmount > 0 --> <input name="order" tabindex="3"
                                                                         ng-pattern="/^[0-9]*$/"
                                                                         ng-model="row.orderCount"
                                                                         ng-model-options="{ getterSetter: true, allowInvalid: true }"
                                                                         ng-focus="row.isNotesFieldFocused = true"
                                                                         ng-blur="row.isNotesFieldFocused = false"
                                                                         ng-min="1"
                                                                         class="ng-pristine ng-untouched ng-valid ng-empty ng-valid-min ng-valid-pattern"
                                                                         type="number"></div>
            <!-- end ngIf: row.loan.isFractionalised !== false --> <!-- ngIf: row.loan.isFractionalised === false -->
        </td>
        <td class="invested-amount ng-binding"></td> <!-- featureFlag: TRANSACTION_FEE is off -->
        <td class="transaction-fee ng-binding ng-scope" feature-flag="TRANSACTION_FEE" feature-flag-hide=""></td>
    </tr>
    <tr> <!-- ngIf: row.isOpen --> </tr>
    </tbody><!-- end ngRepeat: loan in vm.data | page: vm.limit | unique: 'id' track by loan.id -->
    <tbody ng-repeat="loan in vm.data | page: vm.limit | unique: 'id' track by loan.id"
           ng-controller="BrowseMarketplaceRowController as row" ng-init="row.setLoan(loan); row.setAccount(vm.account)"
           class="ng-scope" style="">
    <tr>
        <td class="loan-id" ng-click="row.toggleDetails()"><a href=""> <!-- ngIf: loan.hasProtect -->
            <!-- ngIf: loan.isUppingLimits --> <!-- ngIf: loan.isBuyingDeeper --> <span
                    ng-bind-html="loan.name | softHyphens" class="ng-binding">LAI­00100142</span> </a></td>
        <td class="grade-and-rate">
            <div><span class="grade grade-f" ng-class="row.getGradeClass()"></span> <span
                    class="grade-letter ng-binding">F3</span> <span class="rate ng-binding">39.61%</span></div>
        </td>
        <td class="term ng-binding">60</td>
        <td class="default-rate ng-binding">9.79%</td>
        <td class="loan-amount ng-binding">$5,500</td>
        <td class="purpose ng-binding">Holiday Expenses</td>
        <td class="amount-funded">
            <hmy-progress-bar progress="1%" class="ng-isolate-scope"><span class="bar" style="width: 1%;"></span>
                <span class="text ng-binding">1%</span></hmy-progress-bar>
            <aside class="ng-binding"> 217 notes remaining</aside>
        </td>
        <td class="time-remaining ng-binding">14 days</td>
        <td class="order-notes ng-pristine ng-valid ng-untouched ng-valid-min ng-valid-pattern"
            ng-class="{ focused: row.isNotesFieldFocused, 'whole-loan': row.loan.isFractionalised === false }"
            ng-form="orderNotesField" hmy-pass-click-to-input=""> <!-- ngIf: row.loan.isFractionalised !== false -->
            <div class="position ng-binding ng-scope" ng-if="row.loan.isFractionalised !== false">
                <!-- ngIf: orderNotesField.$error.min -->
                <!-- ngIf: row.isUnderDiversified(row.orderCountTotal()) && !orderNotesField.$error.min -->
                <!-- ngIf: row.loan.alreadyInvestedAmount > 0 --> <input name="order" tabindex="4"
                                                                         ng-pattern="/^[0-9]*$/"
                                                                         ng-model="row.orderCount"
                                                                         ng-model-options="{ getterSetter: true, allowInvalid: true }"
                                                                         ng-focus="row.isNotesFieldFocused = true"
                                                                         ng-blur="row.isNotesFieldFocused = false"
                                                                         ng-min="1"
                                                                         class="ng-pristine ng-untouched ng-valid ng-empty ng-valid-min ng-valid-pattern"
                                                                         type="number"></div>
            <!-- end ngIf: row.loan.isFractionalised !== false --> <!-- ngIf: row.loan.isFractionalised === false -->
        </td>
        <td class="invested-amount ng-binding"></td> <!-- featureFlag: TRANSACTION_FEE is off -->
        <td class="transaction-fee ng-binding ng-scope" feature-flag="TRANSACTION_FEE" feature-flag-hide=""></td>
    </tr>
    <tr> <!-- ngIf: row.isOpen --> </tr>
    </tbody><!-- end ngRepeat: loan in vm.data | page: vm.limit | unique: 'id' track by loan.id -->
    <tbody ng-repeat="loan in vm.data | page: vm.limit | unique: 'id' track by loan.id"
           ng-controller="BrowseMarketplaceRowController as row" ng-init="row.setLoan(loan); row.setAccount(vm.account)"
           class="ng-scope" style="">
    <tr>
        <td class="loan-id" ng-click="row.toggleDetails()"><a href=""> <!-- ngIf: loan.hasProtect -->
            <!-- ngIf: loan.isUppingLimits --> <!-- ngIf: loan.isBuyingDeeper --> <span
                    ng-bind-html="loan.name | softHyphens" class="ng-binding">LAI­00100133</span> </a></td>
        <td class="grade-and-rate">
            <div><span class="grade grade-a" ng-class="row.getGradeClass()"></span> <span
                    class="grade-letter ng-binding">A5</span> <span class="rate ng-binding">13.25%</span></div>
        </td>
        <td class="term ng-binding">60</td>
        <td class="default-rate ng-binding">0.27%</td>
        <td class="loan-amount ng-binding">$35,575</td>
        <td class="purpose ng-binding">Debt Consolidation</td>
        <td class="amount-funded">
            <hmy-progress-bar progress="1%" class="ng-isolate-scope"><span class="bar" style="width: 1%;"></span>
                <span class="text ng-binding">1%</span></hmy-progress-bar>
            <aside class="ng-binding"> 1,406 notes remaining</aside>
        </td>
        <td class="time-remaining ng-binding">14 days</td>
        <td class="order-notes ng-pristine ng-valid ng-untouched ng-valid-min ng-valid-pattern"
            ng-class="{ focused: row.isNotesFieldFocused, 'whole-loan': row.loan.isFractionalised === false }"
            ng-form="orderNotesField" hmy-pass-click-to-input=""> <!-- ngIf: row.loan.isFractionalised !== false -->
            <div class="position ng-binding ng-scope" ng-if="row.loan.isFractionalised !== false">
                <!-- ngIf: orderNotesField.$error.min -->
                <!-- ngIf: row.isUnderDiversified(row.orderCountTotal()) && !orderNotesField.$error.min -->
                <!-- ngIf: row.loan.alreadyInvestedAmount > 0 --> <input name="order" tabindex="5"
                                                                         ng-pattern="/^[0-9]*$/"
                                                                         ng-model="row.orderCount"
                                                                         ng-model-options="{ getterSetter: true, allowInvalid: true }"
                                                                         ng-focus="row.isNotesFieldFocused = true"
                                                                         ng-blur="row.isNotesFieldFocused = false"
                                                                         ng-min="1"
                                                                         class="ng-pristine ng-untouched ng-valid ng-empty ng-valid-min ng-valid-pattern"
                                                                         type="number"></div>
            <!-- end ngIf: row.loan.isFractionalised !== false --> <!-- ngIf: row.loan.isFractionalised === false -->
        </td>
        <td class="invested-amount ng-binding"></td> <!-- featureFlag: TRANSACTION_FEE is off -->
        <td class="transaction-fee ng-binding ng-scope" feature-flag="TRANSACTION_FEE" feature-flag-hide=""></td>
    </tr>
    <tr> <!-- ngIf: row.isOpen --> </tr>
    </tbody><!-- end ngRepeat: loan in vm.data | page: vm.limit | unique: 'id' track by loan.id -->
    <tbody> <!-- ngIf: vm.areNoLoans() --> <!-- ngIf: vm.wasLoadingError --> <!-- ngIf: vm.isCurrentPageLoading() -->
    <!-- ngIf: vm.isOnEmptyPage() --> <!-- ngIf: vm.areAllLoansFilteredOut() --> </tbody>
</table>

编辑1: 我已将页面中的完整html粘贴到pastebin中:https://pastebin.com/TA3EqECQ 我觉得有趣的一件事是,如果我右键单击页面并说show source我得到一个小得多的页面,但它似乎包含javascript,我猜测填充页面更充分。粘贴到pastebin中的html来自我右键单击页面(在firefox中)并选择“Inspect Element”

编辑2: @Jobin

// Try logging in
  try {    
       // get a new web client
       m_webClient = getNewWebClient();

       // Go to the login page
       final HtmlPage loginPage = m_webClient.getPage(m_settings.getLoginPage());

       // Get the form that we are dealing with and within that form,
       // find the submit button and the login fields that we need to enter.
       final HtmlForm form = loginPage.getFormByName(m_settings.getLoginForm());
       final HtmlSubmitInput button = form.getInputByName(m_settings.getLoginFormButton());
       final HtmlTextInput emailField = form.getInputByName(m_settings.getEmailField());
       final HtmlPasswordInput passwordField = form.getInputByName(m_settings.getPasswordField());

       // Enter the login values
       emailField.setValueAttribute(m_username);
       passwordField.setValueAttribute(m_password);

       // Now submit the form by clicking the button and get back the second page.  This second page is NOT the loans page, we need
       // to navigate to that seperately
       // TODO check the login was successful
       postLoginPage = button.click();
   } catch (Exception e) {
       // TODO post error to console
       m_webClient.close();
       m_webClient = null;
       return null;
    }


HtmlPage loansPage = null;
try {
    // Go to the loans page
    loansPage = m_webClient.getPage(m_settings.getBrowseLoansPage());

    // TODO Parse the loans here
} catch (Exception e) {
       // TODO post error to console
}

1 个答案:

答案 0 :(得分:0)

有很多不同的选择,例如XPath,CSS选择器或只是使用java访问DOM。 HtmlUnit主页有一些示例(主题入门)