我使用Html Agility Pack
解析html页面。我成功地在字符串中获取以下文本:
WOCN11 CWTO 170951 Special weather statement Updated by Environment Canada At 5:51 AM EDT Friday 17 June 2011. Special weather statement issued for.. Sarnia - Lambton London - Middlesex Oxford - Brant Waterloo - Wellington. --------------------------------------------------------------------- Dense fog patches with near zero visibility have been reported in The above areas. Extra caution is urged for travellers in these areas. Fog is expected to lift shortly after sunrise this morning. END/OSPC ACCN10 CWTO 170735 Forecast of thunderstorm potential for the province of Ontario Issued by Environment Canada at 3:35 AM EDT Friday 17 June 2011. The next statement will be issued at 4.30 PM today. --------------------------------------------------------------------- Forecast of thunderstorm potential. Today..Isolated non severe thunderstorms over eastern And Northeastern Ontario. Tonight..Isolated non severe thunderstorms over eastern and Northeastern Ontario this evening. Saturday..Isolated non severe thunderstorms over extreme Southwestern Ontario mainly late in the afternoon and evening. --------------------------------------------------------------------- A thunderstorm is defined as severe if it produces one or more of the following: - wind gusts of 90 km/h or greater. - hail of 2 centimetres in diameter or greater. - rainfall amounts of 50 millimetres or greater in one hour or less. - a tornado. Note: this forecast is issued twice daily from May 1 to September 30. END/OSPC
我想只提取以下部分:
Forecast of thunderstorm potential. Today..Isolated non severe thunderstorms over eastern And Northeastern Ontario. Tonight..Isolated non severe thunderstorms over eastern and Northeastern Ontario this evening. Saturday..Isolated non severe thunderstorms over extreme Southwestern Ontario mainly late in the afternoon and evening.
我在.Net 3.5上使用Csharp。任何帮助表示赞赏。
问题已更新
答案 0 :(得分:3)
你可以做到的一种方式(虽然不是100%理想),是这样的:
string[] textSplit = theWholeTextString.Split(new string[] { "---------------------------------------------------------------------" }, StringSplitOptions.None);
string myText = textSplit[2];
当然假设您想要的文本总是在第3部分,并且每个部分总是以'------'行分隔
答案 1 :(得分:0)
为了让我们能够帮助您,您需要告诉我们如何定义要保留的文本。这是一行'---'+'预测'直到最后'---'行还是别的东西等等......一个regExp会完成这项工作,但确切的语法我无法分辨没有更多信息。
答案 2 :(得分:0)
如果您认为只有-------------
行之间的内容符合您的要求,请尝试使用此正则表达式:-{40,}([\s\S](?=-{40,}))-{40,}
。
Regex.Match(report, @"-{40,}([\s\S](?=-{40,}))-{40,}").Value
答案 3 :(得分:0)
看起来唯一分隔文字的是------------------------------------- -------------------------------- characters。
如何使用string.Split()。这是一个例子:
string[] textArray = wholeText.Split(new string[] {"---------------------------------------------------------------------"}, StringSplitOptions.RemoveEmptyEntries);
string text = textArray[2];