C#Regex语法帮助解析字符串

时间:2016-03-07 14:13:10

标签: c# regex string

我有这个

var regex = new Regex(@"StartDate:(.*)EndDate:(.*)W.*Status:(.*)");

所以这会让我获得值,直到它击中字符串中的W是正确的? - 我需要它停在W OR S.我尝试了几种不同的方法,但我没有让它工作。有人得到了一些信息吗?

更多信息:

            record = record.Replace(" ", "").Replace("\r\n", "").Replace("-", "/");
            var regex = new Regex(@"StartDate:(.*)EndDate:(.*)W.*Status:(.*)");
            string strStartDate = regex.Match(record).Groups[1].ToString();
            string strEndDate = regex.Match(record).Groups[2].ToString();
            string Status = regex.Match(record).Groups[3].ToString().ToUpper().StartsWith("In") ? "Inactive" : "Active";

我正在尝试解析一大串值,我只想要3件事 - 开始日期,结束日期和状态(活动/非活动)。但是,每个值有3个不同的值(3个开始日期,3个结束日期,3个状态')

前2个字符串就像这样

"Start Date: 

 2014-09-08 



End Date: 

 2017-09-07 



Warranty Type: 

 XXX 



Status: 

 Active 



Serial Number/IMEI: 

 XXXXXXXXXXX









Description:



XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

第三个字符串就像这样

"Start Date: 

 2014-09-08 



End Date: 

 2017-09-07 



Status: 

 Active 



Warranty Upgrade Code:



SVC_PRIORITY"

在最后一个字符串上,由于猜测结束日期后W.*,它不会显示日期

我没有在最后一个字符串中获得2个日期

4 个答案:

答案 0 :(得分:1)

无需替换示例中的新行

List<string> resultList = new List<string>();

var subjectString = @"Start Date: xxxxx
End Date: yyyy
Warranty Type: zzzz
Status: uuuu
Start Date: aaaa
End Date: bbbb
Status: cccc";

Regex regexObj = new Regex(@"Start Date: (.*?)\nEnd Date: (.*?)\n(.|\n)*?Status: (.*)");
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
    resultList.Add(matchResult.Groups[1].Value);
    resultList.Add(matchResult.Groups[2].Value);
    resultList.Add(matchResult.Groups[4].Value);
    matchResult = matchResult.NextMatch();
} 

答案 1 :(得分:1)

编辑请尝试使用正则表达式解析函数:

using System.Text.RegularExpressions;
using System.Linq;
using System.Windows.Forms;

private static List<string[]> parseString(string input)
{
    var pattern = @"Start\s+Date:\s+([0-9-]+)\s+End\s+Date:\s+([0-9-]+)\s+(?:Warranty\s+Type:\s+\w+\s+)?Status:\s+(\w+)\s*";
    return Regex.Matches(input, pattern).Cast<Match>().ToList().ConvertAll(m => new string[] { m.Groups[1].Value, m.Groups[2].Value, m.Groups[3].Value });

}

// To show the result string
var result1 = parseString(str1);
string result_string = string.Join("\n", result1.ConvertAll(r => string.Format("Start Date: {0}\nEnd Date: {1}\nStatus: {2}", r)).ToArray());
MessageBox.Show(result_string);

输出:

enter image description here

EDIT2 对于OP的情况,您可以从foreach循环内部调用该函数,如下所示:

foreach (HtmlElement el in webBrowser1.Document.GetElementsByTagName("div"))
{
    if (el.GetAttribute("className") == "fluid-row Borderfluid")
    {
        string record = el.InnerText;
        //if record is the string to parse
        var result = parseString(record);
        var result_string = string.Join("\n", result.ConvertAll(r => string.Format("Start Date: {0}\nEnd Date: {1}\nStatus: {2}", r)).ToArray());
        MessageBox.Show(result_string);
    }
}

答案 2 :(得分:0)

您可以使用以下代码替换代码(see IDEONE demo):

var s = @"Start Date: xxxxx
End Date: xxxx
Warranty Type: xxxx
Status: xxxx";
var res = Regex.Replace(s, @":\s+", ": ")            // Remove excessive whitespace
        .Split(new[] { "\r", "\n" }, StringSplitOptions.RemoveEmptyEntries) // Split each line with `:`+space
        .ToDictionary(n => n[0], n => n[1]);              // Create a dictionary
string strStartDate = string.Empty;
string strEndDate = string.Empty;
string Status = string.Empty;
string Warranty = string.Empty;
// Demo & variable assignment
if (res.ContainsKey("Start Date")) {
    Console.WriteLine(res["Start Date"]);
    strStartDate = res["Start Date"];
}
if (res.ContainsKey("Warranty Type")) {
    Console.WriteLine(res["Warranty Type"]);
    Warranty = res["Warranty Type"];
}
if (res.ContainsKey("End Date")) {
    Console.WriteLine(res["End Date"]);
    strEndDate = res["End Date"];
}
if (res.ContainsKey("Status")) {
    Console.WriteLine(res["Status"]);
    string Status = res["Status"];
}

请注意,最好的方法是使用WarrantyTypeStartDate等字段声明自己的类,并在LINQ代码中对其进行初始化。

答案 3 :(得分:0)

避免.*抓住所有导致正则表达式模式创建者陷入困境的问题。而是创建模式以匹配数据中始终发生的数据中的特定模式。

您的模式是\d\d\d\d-\d\d-\d\d\d\d的两个日期,其余的是锚文本,应该用作可以跳过的静态锚点。

以下是查找日期模式的示例。一旦找到,正则表达式将其置于命名匹配捕获组(?<GroupNameHere>...)中,Linq将每个匹配提取到动态实体中并解析日期时间。

数据

请注意,根据您的示例

,第一个日期会相反
var data = @"Start Date:

 2014-09-08

End Date:

 2017-09-07

Status:

 Active

Start Date:

 2014-09-09

End Date:

 2017-09-10

Status:

 In-Active
 ";

<强>模式

string pattern = @"
^Start\sDate:\s+                     # An anchor of start date that always starts at the BOL
(?<Start>\d\d\d\d-\d\d-\d\d)         # actual start date pattern
\s+                                  # a lot of space including \r\n
^End\sDate:\s+                       # End date anchor and space
(?<End>\d\d\d\d-\d\d-\d\d)           # pattern of the end date.
\s+                                  # Same pattern as above for Status
^Status:\s+
(?<Status>[^\s]+)
 ";

<强>处理

// Explicit hints to the parser to ingore any non specified matches ones outside the parenthesis(..)
// Multiline states ^ and $ are beginning and eol lines and not beginning and end of buffer.
// Ignore allows us to comment the pattern only; does not affect processing.
Regex.Matches(data, pattern, RegexOptions.ExplicitCapture |
                             RegexOptions.Multiline       |
                             RegexOptions.IgnorePatternWhitespace)
     .OfType<Match>()
     .Select (mt => new
            {
                Status    = mt.Groups["Status"].Value,
                StartDate = DateTime.Parse(mt.Groups["Start"].Value),
                EndDate   = DateTime.Parse(mt.Groups["End"].Value)
            })

<强>结果

enter image description here