我有来自服务器的以下日志文件,我想从以下字符串中提取xml。
2:00:11 PM >>Response: <?xml version="1.0" encoding="UTF-8"?>
<HotelML xmlns="http://www.xpegs.com/v2001Q3/HotelML"><Head><Route Destination="TR" Source="00"><Operation Action="Create" App="UltraDirect-d1c1_" AppVer="V1_1" DataPath="/HotelML" StartTime="2013-07-31T08:33:13.223+00:00" Success="true" TotalProcessTime="711"/></Route>............
</HotelML>
3:00:11 PM >>Response: <?xml version="1.0" encoding="UTF-8"?>
<HotelML xmlns="http://www.xpegs.com/v2001Q3/HotelML"><Head><Route Destination="TR" Source="00"><Operation Action="Create" App="UltraDirect-d1c1_" AppVer="V1_1" DataPath="/HotelML" StartTime="2013-07-31T08:33:13.223+00:00" Success="true" TotalProcessTime="711"/></Route>............
</HotelML>
5:00:11 PM >>Response: <?xml version="1.0" encoding="UTF-8"?>
<HotelML xmlns="http://www.xpegs.com/v2001Q3/HotelML"><Head><Route Destination="TR" Source="00"><Operation Action="Create" App="UltraDirect-d1c1_" AppVer="V1_1" DataPath="/HotelML" StartTime="2013-07-31T08:33:13.223+00:00" Success="true" TotalProcessTime="711"/></Route>............
</HotelML>
我已经编写了以下正则表达式,但它只匹配字符串中的第一个条目。但我想将所有xml字符串作为集合返回。
(?<= Response:).*>.*</.*?>
答案 0 :(得分:2)
为什么不匹配<HotelML
到</HotelML
?
类似的东西:
<HotelML .*</HotelML>
或者,只需逐行浏览文件,并在找到与
匹配的行时^.* PM >>Response:.*$
将以下行读作xml,直到下一个匹配的行...
答案 1 :(得分:1)
这是另一种方法,应该为您留下List<XDocument>
:
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
using System.Xml.Linq;
class Program
{
static void Main(string[] args)
{
var input = File.ReadAllText("text.txt");
var xmlDocuments = Regex
.Matches(input, @"([0-9AMP: ]*>>Response: )")
.Cast<Match>()
.Select(match =>
{
var currentPosition = match.Index + match.Length;
var nextMatch = match.NextMatch();
if (nextMatch.Success == true)
{
return input.Substring(currentPosition,
nextMatch.Index - currentPosition);
}
else
{
return input.Substring(currentPosition);
}
})
.Select(s => XDocument.Parse(s))
.ToList();
}
}