我的文字如下:
Title A
some description on a few lines, there may be empty lines here
some description on a few lines
Status: some random text
Title B
some description on a few lines, there may be empty lines here
some description on a few lines
Status: some other random text
Title C
some description on a few lines, there may be empty lines here
some description on a few lines
Status: some other random text
我想根据文字Status:
解析文本并获取一系列项目,每个项目都包含标题,描述行和状态。我正在使用C#4.0。
答案 0 :(得分:1)
我就是这样做的(假设从文本文件中读取):
Regex regStatus = new Regex(@"^Status:");
Regex regTitle = new Regex(@"^Title:");
string line;
string[] decriptionLine;
string[] statusLine;
string[] titleLine;
using(TextReader reader = File.OpenText("file.txt"))
{
while(reader.Peek() > 0)
{
line = reader.ReadLine();
if(regStatus.IsMatch(line))
{
// status line, convert to array, can drop first element as it is "status"
statusLine = line.Split(' ');
// do stuff with array
}
else if(regTitle.IsMatch(line))
{
// title line, convert to array can drop first element as it is "title"
titleLine = line.Split(' ');
// do stuff with array
}
else
{
// description line, so just split into array
decriptionLine = line.Split(' ');
// do stuff with array
}
}
}
然后,您可以根据需要获取数组并将其存储在某个类中。我会把它留给你弄清楚。它只是使用一个简单的正则表达式来检查行是否以 “状态:”或“标题:”。说实话,甚至都不需要。你可以这样做:
if(line.StartsWith("Status:")) {}
if(line.StartsWith("Title:")) {}
检查每一行是以状态还是标题开头。
答案 1 :(得分:1)
如果内容的结构与您描述的相似,则可以缓冲文本
string myRegEx = "^String:.*$";
// loop through each line in text
if (System.Text.RegularExpressions.Regex.IsMatch(line, myRegEx))
{
// save the buffer into array
// clear the buffer
}
else
{
// save the text into the buffer
}
答案 2 :(得分:1)
声明项目类型
public class Item
{
public string Title { get; set; }
public string Status { get; set; }
public string Description { get; set; }
}
然后将文本拆分为行
string[] lines = text.Split(new[] { "\r\n" }, StringSplitOptions.None);
或者使用
读取文件中的行string[] lines = File.ReadAllLines(path);
创建将存储结果的项目列表
var result = new List<Item>();
现在我们可以进行解析
Item item;
for (int i = 0; i < lines.Length; i++) {
string line = lines[i];
if (line.StartsWith("Title ")) {
item = new Item();
result.Add(item);
item.Title = line.Substring(6);
} else if (line.StartsWith("Status: ")) {
item.Status = line.Substring(8);
} else { // Description
if (item.Description != null) {
item.Description += "\r\n";
}
item.Description += line;
}
}
请注意,此解决方案没有错误处理。此代码假定输入文本始终格式正确。
答案 3 :(得分:0)
string data = @"Title A
Status: Nothing But Net!
Title B
some description on a few lines, there may be empty lines here
some description on a few lines
Status: some other random text
Title C
Can't stop the invisible Man
Credo Quia Absurdium Est
Status: C Status";
string pattern = @"
^(?:Title\s+)
(?<Title>[^\s]+)
(?:[\r\n\s]+)
(?<Description>.*?)
(?:^Status:\s*)
(?<Status>[^\r\n]+)
";
// Ignorepattern whitespace just allows us to comment the pattern over multiple lines.
Regex.Matches(data, pattern, RegexOptions.Singleline | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => new
{
Title = mt.Groups["Title"].Value,
Description = mt.Groups["Description"].Value.Trim(),
Status = mt.Groups["Status"].Value.Trim()
})
.ToList() // This is here just to do the display of the output
.ForEach(item => Console.WriteLine ("Title {0}: ({1}) and this description:{3}{2}{3}", item.Title, item.Status, item.Description, Environment.NewLine));
输出:
Title A: (Nothing But Net!) and this description:
Title B: (some other random text) and this description:
some description on a few lines, there may be empty lines here
some description on a few lines
Title C: (C Status) and this description:
Can't stop the invisible Man
Credo Quia Absurdium Est