我有这个
var regex = new Regex(@"StartDate:(.*)EndDate:(.*)W.*Status:(.*)");
所以这会让我获得值,直到它击中字符串中的W是正确的? - 我需要它停在W OR S.我尝试了几种不同的方法,但我没有让它工作。有人得到了一些信息吗?
更多信息:
record = record.Replace(" ", "").Replace("\r\n", "").Replace("-", "/");
var regex = new Regex(@"StartDate:(.*)EndDate:(.*)W.*Status:(.*)");
string strStartDate = regex.Match(record).Groups[1].ToString();
string strEndDate = regex.Match(record).Groups[2].ToString();
string Status = regex.Match(record).Groups[3].ToString().ToUpper().StartsWith("In") ? "Inactive" : "Active";
我正在尝试解析一大串值,我只想要3件事 - 开始日期,结束日期和状态(活动/非活动)。但是,每个值有3个不同的值(3个开始日期,3个结束日期,3个状态')
前2个字符串就像这样
"Start Date:
2014-09-08
End Date:
2017-09-07
Warranty Type:
XXX
Status:
Active
Serial Number/IMEI:
XXXXXXXXXXX
Description:
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
第三个字符串就像这样
"Start Date:
2014-09-08
End Date:
2017-09-07
Status:
Active
Warranty Upgrade Code:
SVC_PRIORITY"
在最后一个字符串上,由于猜测结束日期后W.*
,它不会显示日期
我没有在最后一个字符串中获得2个日期
答案 0 :(得分:1)
无需替换示例中的新行
List<string> resultList = new List<string>();
var subjectString = @"Start Date: xxxxx
End Date: yyyy
Warranty Type: zzzz
Status: uuuu
Start Date: aaaa
End Date: bbbb
Status: cccc";
Regex regexObj = new Regex(@"Start Date: (.*?)\nEnd Date: (.*?)\n(.|\n)*?Status: (.*)");
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
resultList.Add(matchResult.Groups[1].Value);
resultList.Add(matchResult.Groups[2].Value);
resultList.Add(matchResult.Groups[4].Value);
matchResult = matchResult.NextMatch();
}
答案 1 :(得分:1)
编辑请尝试使用正则表达式解析函数:
using System.Text.RegularExpressions;
using System.Linq;
using System.Windows.Forms;
private static List<string[]> parseString(string input)
{
var pattern = @"Start\s+Date:\s+([0-9-]+)\s+End\s+Date:\s+([0-9-]+)\s+(?:Warranty\s+Type:\s+\w+\s+)?Status:\s+(\w+)\s*";
return Regex.Matches(input, pattern).Cast<Match>().ToList().ConvertAll(m => new string[] { m.Groups[1].Value, m.Groups[2].Value, m.Groups[3].Value });
}
// To show the result string
var result1 = parseString(str1);
string result_string = string.Join("\n", result1.ConvertAll(r => string.Format("Start Date: {0}\nEnd Date: {1}\nStatus: {2}", r)).ToArray());
MessageBox.Show(result_string);
输出:
EDIT2 对于OP的情况,您可以从foreach循环内部调用该函数,如下所示:
foreach (HtmlElement el in webBrowser1.Document.GetElementsByTagName("div"))
{
if (el.GetAttribute("className") == "fluid-row Borderfluid")
{
string record = el.InnerText;
//if record is the string to parse
var result = parseString(record);
var result_string = string.Join("\n", result.ConvertAll(r => string.Format("Start Date: {0}\nEnd Date: {1}\nStatus: {2}", r)).ToArray());
MessageBox.Show(result_string);
}
}
答案 2 :(得分:0)
您可以使用以下代码替换代码(see IDEONE demo):
var s = @"Start Date: xxxxx
End Date: xxxx
Warranty Type: xxxx
Status: xxxx";
var res = Regex.Replace(s, @":\s+", ": ") // Remove excessive whitespace
.Split(new[] { "\r", "\n" }, StringSplitOptions.RemoveEmptyEntries) // Split each line with `:`+space
.ToDictionary(n => n[0], n => n[1]); // Create a dictionary
string strStartDate = string.Empty;
string strEndDate = string.Empty;
string Status = string.Empty;
string Warranty = string.Empty;
// Demo & variable assignment
if (res.ContainsKey("Start Date")) {
Console.WriteLine(res["Start Date"]);
strStartDate = res["Start Date"];
}
if (res.ContainsKey("Warranty Type")) {
Console.WriteLine(res["Warranty Type"]);
Warranty = res["Warranty Type"];
}
if (res.ContainsKey("End Date")) {
Console.WriteLine(res["End Date"]);
strEndDate = res["End Date"];
}
if (res.ContainsKey("Status")) {
Console.WriteLine(res["Status"]);
string Status = res["Status"];
}
请注意,最好的方法是使用WarrantyType
,StartDate
等字段声明自己的类,并在LINQ代码中对其进行初始化。
答案 3 :(得分:0)
避免.*
抓住所有导致正则表达式模式创建者陷入困境的问题。而是创建模式以匹配数据中始终发生的数据中的特定模式。
您的模式是\d\d\d\d-\d\d-\d\d\d\d
的两个日期,其余的是锚文本,应该用作可以跳过的静态锚点。
以下是查找日期模式的示例。一旦找到,正则表达式将其置于命名匹配捕获组(?<GroupNameHere>...)
中,Linq将每个匹配提取到动态实体中并解析日期时间。
数据强>
请注意,根据您的示例
,第一个日期会相反var data = @"Start Date:
2014-09-08
End Date:
2017-09-07
Status:
Active
Start Date:
2014-09-09
End Date:
2017-09-10
Status:
In-Active
";
<强>模式强>
string pattern = @"
^Start\sDate:\s+ # An anchor of start date that always starts at the BOL
(?<Start>\d\d\d\d-\d\d-\d\d) # actual start date pattern
\s+ # a lot of space including \r\n
^End\sDate:\s+ # End date anchor and space
(?<End>\d\d\d\d-\d\d-\d\d) # pattern of the end date.
\s+ # Same pattern as above for Status
^Status:\s+
(?<Status>[^\s]+)
";
<强>处理强>
// Explicit hints to the parser to ingore any non specified matches ones outside the parenthesis(..)
// Multiline states ^ and $ are beginning and eol lines and not beginning and end of buffer.
// Ignore allows us to comment the pattern only; does not affect processing.
Regex.Matches(data, pattern, RegexOptions.ExplicitCapture |
RegexOptions.Multiline |
RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => new
{
Status = mt.Groups["Status"].Value,
StartDate = DateTime.Parse(mt.Groups["Start"].Value),
EndDate = DateTime.Parse(mt.Groups["End"].Value)
})
<强>结果强>