从给定条件的字符串数组中提取字符串,即类的属性

时间:2014-05-01 04:42:35

标签: c# arrays regex linq

用例:我导入文本文件,需要读取包含4行的内容。通过这4行,我解析出没有预定义的字符串,而是动态的。

以下是我从@ zx81收集的一个例子:

输入:

on Apr 28, 2014 at 22:00
an Employee John Doe accessed
server - TPCX123
AccessType2 was ReasonType1 - program: Px2x3x, start: No22, 0.0 sec

所以考虑到以上4行,我想要保留它们的回车(即4行)或者将它全部变成一个字符串(即只有一行),我将提取属性并放入它们通过Class的属性进入内存,例如ReportDateReportTimeEmployeeNameServerNameAccessTypeReasonTypeProgramIdStart,{{ 1}}。

期望输出:

Length

这就是我想要的 - 在等号的RHS上找到的所有项目,即分配给内存中ReportDate = Apr 28, 2014 ReportTime = 22:00 EmployeeName = John Doe ServerName = TnCX123 AccessType = AccessType2 ReasonType = ReasonType1 ProgramId = Px2x3x Start = No22 Length = 0.0 sec 中找到的特定属性的某些字符串,它们最终响应数据库表的列。从上面的例子中,属性Object将始终位于相同的位置(特定字符串之间),因此将解析其值,例如, “约翰·多伊”。当然,对于我引入的每个文件,这些值都是不同的,因此是它的动态部分。

希望这有帮助,谢谢。

1 个答案:

答案 0 :(得分:2)

鉴于你的数据,这样的东西会输出你想要的东西:

<强>输出:

ReportDate = Apr 28, 2014
ReportTime = 22:00
EmployeeName = John Doe
ServerName = TnCX123
AccessType = AccessType2
ReasonType = ReasonType1
ProgramId = Px2x3x
Start = No22
Length = 0.0 sec

<强>代码:

using System;
using System.Text.RegularExpressions;
using System.Collections.Specialized;
class Program
{

    static void Main()
    {
    string s1 = @"on Apr 28, 2014 at 22:00
an Employee John Doe accessed
server - TPCX123
AccessType2 was ReasonType1 - program: Px2x3x, start: No22, 0.0 sec";

    try
    {
    var myRegex = new Regex(@"(?s)^on\s+([\w, ]+?) at (\d{2}:\d{2}).*?Employee ([\w ]+) accessed.*?server - (\w+).*?(\w+) was (\w+) - program: (\w+), start: (\w+), (\d+\.\d+ \w+)");
    string date = myRegex.Match(s1).Groups[1].Value;
    string time = myRegex.Match(s1).Groups[2].Value;
    string name = myRegex.Match(s1).Groups[3].Value;
    string server = myRegex.Match(s1).Groups[4].Value;
    string access = myRegex.Match(s1).Groups[5].Value;
    string reason = myRegex.Match(s1).Groups[6].Value;
    string prog = myRegex.Match(s1).Groups[7].Value;
    string start = myRegex.Match(s1).Groups[8].Value;
    string length = myRegex.Match(s1).Groups[9].Value;
    Console.WriteLine("ReportDate = " + date);
    Console.WriteLine("ReportTime = " + time);
    Console.WriteLine("EmployeeName = " + name);
    Console.WriteLine("ServerName = " + server);
    Console.WriteLine("AccessType = " + access);
    Console.WriteLine("ReasonType = " + reason);
    Console.WriteLine("ProgramId = " + prog);
    Console.WriteLine("Start = " + start);
    Console.WriteLine("Length = " + length);
    }
    catch (ArgumentException ex)
    {
    // We have a syntax error
    }

    Console.WriteLine("\nPress Any Key to Exit.");
    Console.ReadKey();
    } // END Main
} // END Program

调整

然而,要调整它,你将不得不刷新你的正则表达式。

为了让您开始,这里是代码中正则表达式的逐令牌解释。然后,我建议您访问常见问题解答中提及的FAQRexEgg和其他网站。

@"
(?                 # Use these options for the whole regular expression
   s                  # Dot matches line breaks
)
^                  # Assert position at the beginning of the string
on                 # Match the character string “on” literally (case sensitive)
\s                 # Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line)
   +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(                  # Match the regex below and capture its match into backreference number 1
   [\w,\ ]            # Match a single character present in the list below
                         # A “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
                         # A single character from the list “, ”
      +?                 # Between one and unlimited times, as few times as possible, expanding as needed (lazy)
)
\ at\              # Match the character string “ at ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 2
   \d                 # Match a single character that is a “digit” (0–9 in any Unicode script)
      {2}                # Exactly 2 times
   :                  # Match the character “:” literally
   \d                 # Match a single character that is a “digit” (0–9 in any Unicode script)
      {2}                # Exactly 2 times
)
.                  # Match any single character
   *?                 # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
Employee\          # Match the character string “Employee ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 3
   [\w\ ]             # Match a single character present in the list below
                         # A “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
                         # The literal character “ ”
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\ accessed         # Match the character string “ accessed” literally (case sensitive)
.                  # Match any single character
   *?                 # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
server\ -\         # Match the character string “server - ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 4
   \w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
.                  # Match any single character
   *?                 # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
(                  # Match the regex below and capture its match into backreference number 5
   \w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\ was\             # Match the character string “ was ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 6
   \w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\ -\ program:\     # Match the character string “ - program: ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 7
   \w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
,\ start:\         # Match the character string “, start: ” literally (case sensitive)
(                  # Match the regex below and capture its match into backreference number 8
   \w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
,\                 # Match the character string “, ” literally
(                  # Match the regex below and capture its match into backreference number 9
   \d                 # Match a single character that is a “digit” (0–9 in any Unicode script)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   \.                 # Match the character “.” literally
   \d                 # Match a single character that is a “digit” (0–9 in any Unicode script)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   \                  # Match the character “ ” literally
   \w                 # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
      +                  # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
"