我正在JAVA构建一个电子邮件抓取工具,需要从特定邮件中抓取信息。这些邮件已经发送了几年。我面临的问题是,每年html代码都有一点变化,我的代码在特定年份可以正常工作,但不适用于下一年或前一年。我正在寻找一种编写智能代码的方法。需要的是我需要的值,VARIABLE可以是不同的。相同标题总是一样的。
<span class="confirmationtitle">SAME TITLE 1</span></td></tr>
<tr>
<td><span class="confirmationleft">VARIABLE</span></td>
<td><span class="confirmationright">NEEDED1 </span></td>
</tr>
<tr>
<td><span class="confirmationleft">VARIABLE</span></td>
<td><span class="confirmationright">NEEDED2</span></td>
</tr>
<tr>
<td><span class="confirmationleft">VARIABLE</span></td>
<td><span class="confirmationright">NEEDED3</span></td>
</tr>
<tr>
<td><span class="confirmationleft">VARIABLE</span></td>
<td><span class="confirmationright">NEEDED4</span></td>
</tr>
<tr>
<td><span class="confirmationleft">VARIABLE</span></td>
<td><span class="confirmationright">NEEDED5</span></td>
</tr>
<tr>
<td><span class="confirmationleft">VARIABLE</span></td>
<td><span class="confirmationright">NEEDED6</span></td>
</tr>
高于x年代码,低于y年代。像这样有多个表行具有不同的信息。
<tr>
<div style="font-weight: bold; display: block; margin-top: 20px;">SAME TITLE 1</div>
</td>
</tr>
<tr>
<td><span style="width: 145px; display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; vertical-align: top;">VARIABLE</span></td>
<td><span style="display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; width: 488px; vertical-align: top;">NEEDED1 </span></td>
</tr>
<tr>
<td><span style="width: 145px; display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; vertical-align: top;">VARIABLE</span></td>
<td><span style="display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; width: 488px; vertical-align: top;">NEEDED2</span></td>
</tr>
<tr>
<td><span style="width: 145px; display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; vertical-align: top;">VARIABLE</span></td>
<td><span style="display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; width: 488px; vertical-align: top;">NEEDED3</span></td>
</tr>
<tr>
<td><span style="width: 145px; display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; vertical-align: top;">VARIABLE</span></td>
<td><span style="display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; width: 488px; vertical-align: top;">NEEDED4</span></td>
</tr>
<tr>
<td><span style="width: 145px; display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; vertical-align: top;">VARIABLE</span></td>
<td><span style="display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; width: 488px; vertical-align: top;">NEEDED5</span></td>
</tr>
<tr>
<td><span style="width: 145px; display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; vertical-align: top;">VARIABLE</span></td>
<td><span style="display: inline-block; zoom:1; /* IE 7 Hack starts here*/ *display:inline; line-height: 18px; width: 488px; vertical-align: top;">NEEDED6</span></td>
</tr>
<tr>
表格中此特定行的代码:
String[] SpecificInfo = new String[6];
String TravellerInfoGender = SpecificInfo[0] = headerInfo.split("</span></td>")[1].split("</span>")[0].split(">")[2];
String TravellerInfoFirstname = SpecificInfo[1] = headerInfo.split("</span></td>")[3].split("</span>")[0].split(">")[2];
String TravellerInfoMiddleName = SpecificInfo[2] = headerInfo.split("</span></td>")[5].split("</span>")[0].split(">")[2];
String TravellerInfoSurName = SpecificInfo[3] = headerInfo.split("</span></td>")[7].split("</span>")[0].split(">")[2];
String TravellerInfoDateOfBirth = SpecificInfo[4] = headerInfo.split("</span></td>")[9].split("</span>")[0].split(">")[2];
String TravellerInfoNationality = SpecificInfo[5] = headerInfo.split("</span></td>")[11].split("</span>")[0].split(">")[2];
for(int i = 0; i < TravellerInfo.length ; i++)
writeToFile(TravellerInfo[i]);
return TravellerInfo;
其中headerInfo包含前两个代码示例中的html代码段。
我希望有一种方法,我不需要对每一个小改动进行硬编码。
谢谢!