我的字符串如下:
rta_geo5: 09/24/14 15:10:38 - Reset_count = 6
rta_geo5: 09/24/14 15:10:38 - restarting
rta_geo5: 09/24/14 15:10:38 - memory allocation: 3500 lines
我的目标是将此字符串拆分为三列,以便将其放入数据库表中:
-------------------------------------------------------------
| COL1 | COL 2 | COL 3 |
-------------------------------------------------------------
| rta_geo5 | 09/24/14 15:10:38 |Reset_count = 6 |
-------------------------------------------------------------
|rta_geo5 | 09/24/14 15:10:38 |restarting |
-------------------------------------------------------------
| rta_geo5 | 09/24/14 15:10:38 |memory allocation: 3500 lines |
-------------------------------------------------------------
可以使用以下声明吗?
string[] substrings = Regex.Split(input, pattern);
我只需要正确的正则表达式。
答案 0 :(得分:1)
图案:
Regex ptrn = new Regex(@"^(?<col1>[^:]+):\s+(?<col2>\d{2}/\d{2}/\d{2} \d{2}:\d{2}:\d{2})\s+-\s+(?<col3>[^\r\n]+?)\s*$",
RegexOptions.ExplicitCapture|RegexOptions.IgnoreCase|RegexOptions.Multiline);
用法:
string s = @"rta_geo5: 09/24/14 15:10:38 - Reset_count = 6
rta_geo5: 09/24/14 15:10:38 - restarting
rta_geo5: 09/24/14 15:10:38 - memory allocation: 3500 lines";
var matches = ptrn.Matches(s);
访问:
matches.OfType<Match>()
.Select(match => new string[]
{
match.Groups["col1"].Value,
match.Groups["col2"].Value,
match.Groups["col3"].Value
})
.ToList().ForEach(a=>System.Console.WriteLine(string.Join("\t|\t",a)));
或者:
foreach (Match match in matches)
{
string col1 = match.Groups["col1"].Value;
string col2 = match.Groups["col2"].Value;
string col3 = match.Groups["col3"].Value;
System.Console.WriteLine(col1 + "\t|\t" + col2 + "\t|\t" + col3);
}
输出:
rta_geo5 | 09/24/14 15:10:38 | Reset_count = 6
rta_geo5 | 09/24/14 15:10:38 | restarting
rta_geo5 | 09/24/14 15:10:38 | memory allocation: 3500 lines
答案 1 :(得分:0)
答案 2 :(得分:0)
我不会为此使用正则表达式(或String.Split),而是一个解析每一行的循环。我还会使用自定义类映射到数据库表,以提高可重用性和可重用性。
班级(简化):
public class Data
{
public string Token1 { get; set; } // use a meaningful name
public string Token2 { get; set; } // use a meaningful name
public DateTime Date { get; set; } // use a meaningful name
public override string ToString()
{
return string.Format("Token1:[{0}] Date:[{1}] Token2:[{2}]",
Token1,
Date.ToString("MM/dd/yy HH:mm:ss", CultureInfo.InvariantCulture),
Token2);
}
}
您的示例字符串:
string data = @"rta_geo5: 09/24/14 15:10:38 - Reset_count = 6
rta_geo5: 09/24/14 15:10:38 - restarting
rta_geo5: 09/24/14 15:10:38 - memory allocation: 3500 lines";
现在您可以使用普通字符串方法将此循环解析为List<Data>
:
string[] lines = data.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
List<Data> allData = new List<Data>();
foreach (string line in lines)
{
string token1 = null, token2 = null;
DateTime dt;
int firstColonIndex = line.IndexOf(": ");
if (firstColonIndex >= 0)
{
token1 = line.Remove(firstColonIndex);
firstColonIndex += 2; // start next search after first token to find DateTime
int indexOfMinus = line.IndexOf(" - ", firstColonIndex);
if (indexOfMinus >= 0)
{
string datePart = line.Substring(firstColonIndex, indexOfMinus - firstColonIndex);
if (DateTime.TryParseExact(datePart, "MM/dd/yy HH:mm:ss", CultureInfo.InvariantCulture, DateTimeStyles.None, out dt))
{
indexOfMinus += 3; // start next search after DateTime to get last token
token2 = line.Substring(indexOfMinus);
Data d = new Data { Token1 = token1, Token2 = token2, Date = dt };
allData.Add(d);
}
}
}
}
测试:
foreach (Data d in allData)
Console.WriteLine(d.ToString());
Token1:[rta_geo5] Date:[09/24/14 15:10:38] Token2:[Reset_count = 6]
Token1:[rta_geo5] Date:[09/24/14 15:10:38] Token2:[restarting]
Token1:[rta_geo5] Date:[09/24/14 15:10:38] Token2:[memory allocation: 3500 lines]
这种方法比其他方法更冗长,但更有效/可维护。它还允许记录异常或使用其他方法来解析它。
答案 3 :(得分:0)
好吧,考虑过这个问题,不确定这是100%,但请尝试:
(rta_geo5): (.*?) - (.*)
应根据需要将其分为3组。但是,它假定前导标识符始终为(rta_geo5)
。
[edit] - 我注意到其中一个答案引用了在线正则表达式服务,因此您可以尝试在我的内部使用我的正则表达式:http://regex101.com/r/xF7iD7/1(对不起,没有还有一个帐户 - 但现在会创建) - 也就是说,关于rta_geo5块,您当然可以完全原生
(.*): (.*) - (.*)
看看它是如何工作的