我正在处理一个html文件,但源有很多空格。
<span itemscope itemtype="Company">
<h3 class="headingLink">
COMPANY
<span itemprop="name"><br/>Kroger</span>
</h3>
<span itemprop="streetAddress">607 Moores Mill Rd</span><br/>
<span itemprop="addressLocality">Huntsville</span>
<span itemprop="addressRegion">AL</span> <span itemprop="postalCode">35811</span>
<br/><br/>
</span> <!-- end itemscope company -->
<span itemscope itemtype="company">
<h3 class="headingLink">
COMPANY2
<span itemprop="name"><br/>North Militia</span>
</h3>
<span itemprop="streetAddress">225 N. Militia Dr</span><br/>
<span itemprop="addressLocality">Lawrenceburg</span>
<span itemprop="addressRegion">TN</span> <span itemprop="postalCode">38464</span>
<br/><br/>
</span> <!-- end itemscope company -->
<span itemprop="telephone">981-112-5521<br/></span>
<a rel="nofollow" id="aboutThisPhoneNumber9" href="javascript:void(0);"
onclick="changeDisplay('/locator/locator/QuickHelp.do?
startsWithFilterProperty=about.this.phone.number&openedItemProperty=about.this.phone.number9', 'quickHelpBox',
'clickQuickHelp', event);return true;" class="dottedLink">About this phone number<span class="ada_hidden">-981-112-
5521
<strong>Company Hours</strong>
<br/>
大厅:
<!-- Set request scoped variable expected by daysOfWeekRange.jsp -->
Mon-Thu
9-4,
<!-- Set request scoped variable expected by daysOfWeekRange.jsp -->
Fri
9-5,
<!-- Set request scoped variable expected by daysOfWeekRange.jsp -->
Sat-Sun
COMPANY open 24 hours
<br/>(visitors need to call in)
我希望在表格中存储姓名,地址,电话号码,公司营业时间。所以,我试过
HtmlDocument hdoc = new HtmlDocument();
hdoc.OptionWriteEmptyNodes = true;
hdoc.LoadHtml(searchResult);
HtmlNodeCollection divCon = hdoc.DocumentNode.SelectNodes("//div[@class=\"resultInfo\"]");
string[] ct = new string[divCon.Count];
for (int m = 0; m < divCon.Count; m++)
{
if (divCon[m].InnerText.Trim().Contains("Company Hours"))
{
int ctr = divCon[m].InnerText.Trim().Replace("\n", " ").Replace("<!-- Set request scoped variable expected by daysOfWeekRange.jsp -->","").Replace(" "," ").Replace("</br>","").IndexOf("Branch Hours");
if (ctr > 0)
ct[m] = divCon[m].InnerText.Trim().Replace("\n", " ").Replace("<!-- Set request scoped variable expected by daysOfWeekRange.jsp -->", "").Replace(" ", " ").Replace("</br>", "").Substring(ctr, 499);
}
我仍然得到空的空间。我该如何删除它?
THX Rashmi