用例:我导入文本文件,需要读取包含4行的内容。通过这4行,我解析出没有预定义的字符串,而是动态的。
以下是我从@ zx81收集的一个例子:
输入:
on Apr 28, 2014 at 22:00
an Employee John Doe accessed
server - TPCX123
AccessType2 was ReasonType1 - program: Px2x3x, start: No22, 0.0 sec
所以考虑到以上4行,我想要保留它们的回车(即4行)或者将它全部变成一个字符串(即只有一行),我将提取属性并放入它们通过Class
的属性进入内存,例如ReportDate
,ReportTime
,EmployeeName
,ServerName
,AccessType
,ReasonType
,ProgramId
,Start
,{{ 1}}。
期望输出:
Length
这就是我想要的 - 在等号的RHS上找到的所有项目,即分配给内存中ReportDate = Apr 28, 2014
ReportTime = 22:00
EmployeeName = John Doe
ServerName = TnCX123
AccessType = AccessType2
ReasonType = ReasonType1
ProgramId = Px2x3x
Start = No22
Length = 0.0 sec
中找到的特定属性的某些字符串,它们最终响应数据库表的列。从上面的例子中,属性Object
将始终位于相同的位置(特定字符串之间),因此将解析其值,例如, “约翰·多伊”。当然,对于我引入的每个文件,这些值都是不同的,因此是它的动态部分。
希望这有帮助,谢谢。
答案 0 :(得分:2)
鉴于你的数据,这样的东西会输出你想要的东西:
<强>输出:强>
ReportDate = Apr 28, 2014
ReportTime = 22:00
EmployeeName = John Doe
ServerName = TnCX123
AccessType = AccessType2
ReasonType = ReasonType1
ProgramId = Px2x3x
Start = No22
Length = 0.0 sec
<强>代码:强>
using System;
using System.Text.RegularExpressions;
using System.Collections.Specialized;
class Program
{
static void Main()
{
string s1 = @"on Apr 28, 2014 at 22:00
an Employee John Doe accessed
server - TPCX123
AccessType2 was ReasonType1 - program: Px2x3x, start: No22, 0.0 sec";
try
{
var myRegex = new Regex(@"(?s)^on\s+([\w, ]+?) at (\d{2}:\d{2}).*?Employee ([\w ]+) accessed.*?server - (\w+).*?(\w+) was (\w+) - program: (\w+), start: (\w+), (\d+\.\d+ \w+)");
string date = myRegex.Match(s1).Groups[1].Value;
string time = myRegex.Match(s1).Groups[2].Value;
string name = myRegex.Match(s1).Groups[3].Value;
string server = myRegex.Match(s1).Groups[4].Value;
string access = myRegex.Match(s1).Groups[5].Value;
string reason = myRegex.Match(s1).Groups[6].Value;
string prog = myRegex.Match(s1).Groups[7].Value;
string start = myRegex.Match(s1).Groups[8].Value;
string length = myRegex.Match(s1).Groups[9].Value;
Console.WriteLine("ReportDate = " + date);
Console.WriteLine("ReportTime = " + time);
Console.WriteLine("EmployeeName = " + name);
Console.WriteLine("ServerName = " + server);
Console.WriteLine("AccessType = " + access);
Console.WriteLine("ReasonType = " + reason);
Console.WriteLine("ProgramId = " + prog);
Console.WriteLine("Start = " + start);
Console.WriteLine("Length = " + length);
}
catch (ArgumentException ex)
{
// We have a syntax error
}
Console.WriteLine("\nPress Any Key to Exit.");
Console.ReadKey();
} // END Main
} // END Program
调整
然而,要调整它,你将不得不刷新你的正则表达式。
为了让您开始,这里是代码中正则表达式的逐令牌解释。然后,我建议您访问常见问题解答中提及的FAQ,RexEgg和其他网站。
@"
(? # Use these options for the whole regular expression
s # Dot matches line breaks
)
^ # Assert position at the beginning of the string
on # Match the character string “on” literally (case sensitive)
\s # Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
( # Match the regex below and capture its match into backreference number 1
[\w,\ ] # Match a single character present in the list below
# A “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
# A single character from the list “, ”
+? # Between one and unlimited times, as few times as possible, expanding as needed (lazy)
)
\ at\ # Match the character string “ at ” literally (case sensitive)
( # Match the regex below and capture its match into backreference number 2
\d # Match a single character that is a “digit” (0–9 in any Unicode script)
{2} # Exactly 2 times
: # Match the character “:” literally
\d # Match a single character that is a “digit” (0–9 in any Unicode script)
{2} # Exactly 2 times
)
. # Match any single character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
Employee\ # Match the character string “Employee ” literally (case sensitive)
( # Match the regex below and capture its match into backreference number 3
[\w\ ] # Match a single character present in the list below
# A “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
# The literal character “ ”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\ accessed # Match the character string “ accessed” literally (case sensitive)
. # Match any single character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
server\ -\ # Match the character string “server - ” literally (case sensitive)
( # Match the regex below and capture its match into backreference number 4
\w # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
. # Match any single character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
( # Match the regex below and capture its match into backreference number 5
\w # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\ was\ # Match the character string “ was ” literally (case sensitive)
( # Match the regex below and capture its match into backreference number 6
\w # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\ -\ program:\ # Match the character string “ - program: ” literally (case sensitive)
( # Match the regex below and capture its match into backreference number 7
\w # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
,\ start:\ # Match the character string “, start: ” literally (case sensitive)
( # Match the regex below and capture its match into backreference number 8
\w # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
,\ # Match the character string “, ” literally
( # Match the regex below and capture its match into backreference number 9
\d # Match a single character that is a “digit” (0–9 in any Unicode script)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\. # Match the character “.” literally
\d # Match a single character that is a “digit” (0–9 in any Unicode script)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\ # Match the character “ ” literally
\w # Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
"