我正在尝试解析多行电子邮件,因此我可以获取电子邮件正文标题下位于其自己的换行符上的数据。 看起来像这样:
EMAIL STARTING IN APRIL
Marketing ID Local Number
------------------- ----------------------
GR332230 0000232323
Dispatch Code Logic code
----------------- -------------------
GX3472 1
Destination ID Destination details
----------------- -------------------
3411144
使用字符串阅读器readline时,似乎每个消息框上的所有内容都得到了,尽管我想要的只是每个------
下的数据,如图所示
这是我的代码:
foreach (MailItem mail in publicFolder.Items)
{
if (mail != null)
{
if (mail is MailItem)
{
MessageBox.Show(mail.Body, "MailItem body");
// Creates new StringReader instance from System.IO
using (StringReader reader = new StringReader(mail.Body))
{
string line;
while ((line = reader.ReadLine()) !=null)
//Loop over the lines in the string.
if (mail.Body.Contains("Marketing ID"))
{
// var localno = mail.Body.Substring(247,15);//not correct approach
// MessageBox.Show(localrefno);
//MessageBox.Show("found");
//var conexid = mail.Body.Replace(Environment.NewLine);
var regex = new Regex("<br/>", RegexOptions.Singleline);
MessageBox.Show(line.ToString());
}
}
//var stringBuilder = new StringBuilder();
//foreach (var s in mail.Body.Split(' '))
//{
// stringBuilder.Append(s).AppendLine();
//}
//MessageBox.Show(stringBuilder.ToString());
}
else
{
MessageBox.Show("Nothing found for MailItem");
}
}
}
您可以看到,即使使用子串位置和正则表达式,我也进行了许多尝试。请帮助我从---
下的每一行中获取数据。
答案 0 :(得分:1)
var dict = new Dictionary<string, string>();
try
{
var lines = email.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
int starts = 0, end = 0, length = 0;
while (!lines[starts + 1].StartsWith("-")) starts++;
for (int i = starts + 1; i < lines.Length; i += 3)
{
var mc = Regex.Matches(lines[i], @"(?:^| )-");
foreach (Match m in mc)
{
int start = m.Value.StartsWith(" ") ? m.Index + 1 : m.Index;
end = start;
while (lines[i][end++] == '-' && end < lines[i].Length - 1) ;
length = Math.Min(end - start, lines[i - 1].Length - start);
string key = length > 0 ? lines[i - 1].Substring(start, length).Trim() : "";
end = start;
while (lines[i][end++] == '-' && end < lines[i].Length) ;
length = Math.Min(end - start, lines[i + 1].Length - start);
string value = length > 0 ? lines[i + 1].Substring(start, length).Trim() : "";
dict.Add(key, value);
}
}
}
catch (Exception ex)
{
throw new Exception("Email is not in correct format");
}
使用正则表达式:
var dict = new Dictionary<string, string>();
try
{
var lines = email.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
int starts = 0;
while (!lines[starts + 1].StartsWith("-")) starts++;
for (int i = starts + 1; i < lines.Length; i += 3)
{
var keys = Regex.Matches(lines[i - 1], @"(?:^| )(\w+\s?)+");
var values = Regex.Matches(lines[i + 1], @"(?:^| )(\w+\s?)+");
if (keys.Count == values.Count)
for (int j = 0; j < keys.Count; j++)
dict.Add(keys[j].Value.Trim(), values[j].Value.Trim());
else // remove bug if value of first key in a line has no value
{
if (lines[i + 1].StartsWith(" "))
{
dict.Add(keys[0].Value.Trim(), "");
dict.Add(keys[1].Value.Trim(), values[0].Value.Trim());
}
else
{
dict.Add(keys[0].Value, values[0].Value.Trim());
dict.Add(keys[1].Value.Trim(), "");
}
}
}
}
catch (Exception ex)
{
throw new Exception("Email is not in correct format");
}
答案 1 :(得分:0)
这是我的尝试。我不知道电子邮件格式是否可以更改(行,列等)。
除了检查双倍空格(我的解决方案)之外,我想不出一种简单的方法来分隔列。
c.images[2] # Third image.
输出看起来像这样:
营销编号GR332230本地号码0000232323 调度代码GX3472逻辑代码1 目的地ID,3411144,目的地详细信息,
答案 2 :(得分:0)
这里是一个假设,假设您不需要标题,信息按顺序排列并且是必需的。 对于具有空格或可选字段的数据,此方法不起作用。
foreach (MailItem mail in publicFolder.Items)
{
MessageBox.Show(mail.Body, "MailItem body");
// Split by line, remove dash lines.
var data = Regex.Split(mail.Body, @"\r?\n|\r")
.Where(l => !l.StartsWith('-'))
.ToList();
// Remove headers
for(var i = data.Count -2; lines >= 0; i -2)
{
data.RemoveAt(i);
}
// now data contains only the info you want in the order it was presented.
// Asuming info doesn't have spaces.
var result = data.SelectMany(d => d.Split(' '));
// WARNING: Missing info will not be present.
// {"GR332230", "0000232323", "GX3472", "1", "3411144"}
}
答案 3 :(得分:0)
使用Regex这样做不是一个好主意,因为忘记边缘情况非常容易,不容易理解,也不容易调试。遇到Regex挂起CPU并超时的情况很容易。 (我无法对其他答案做出任何评论。因此,在选择最终解决方案之前,请至少检查我的另外两种情况。)
在您的情况下,以下Regex解决方案适用于您提供的示例。但是,还有一些其他限制:您需要确保在非起始或非终止列中没有空值。或者,假设有两列以上,而中间的任何一列为空,则会使该行的名称和值不匹配。
不幸的是,由于我不了解规格,因此我无法为您提供非Regex解决方案,例如:是否会有空白?会有标签吗?每个字段都具有固定的字符数,还是灵活?如果它是灵活的并且可以具有空值,那么什么样的规则可以检测哪些列为空?我认为它们很有可能由列名的长度定义,并且只有空格作为定界符。如果是这种情况,有两种方法可以解决此问题:两次通过Regex或编写自己的解析器。如果所有字段的长度都固定,那么这样做会更加容易:只需使用子字符串剪切线,然后修剪它们。
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
public class Program
{
public class Record{
public string Name {get;set;}
public string Value {get;set;}
}
public static void Main()
{
var regex = new Regex(@"(?<name>((?!-)[\w]+[ ]?)*)(?>(?>[ \t]+)?(?<name>((?!-)[\w]+[ ]?)+)?)+(?:\r\n|\r|\n)(?>(?<splitters>(-+))(?>[ \t]+)?)+(?:\r\n|\r|\n)(?<value>((?!-)[\w]+[ ]?)*)(?>(?>[ \t]+)?(?<value>((?!-)[\w]+[ ]?)+)?)+", RegexOptions.Compiled);
var testingValue =
@"EMAIL STARTING IN APRIL
Marketing ID Local Number
------------------- ----------------------
GR332230 0000232323
Dispatch Code Logic code
----------------- -------------------
GX3472 1
Destination ID Destination details
----------------- -------------------
3411144";
var matches = regex.Matches(testingValue);
var rows = (
from match in matches.OfType<Match>()
let row = (
from grp in match.Groups.OfType<Group>()
select new {grp.Name, Captures = grp.Captures.OfType<Capture>().ToList()}
).ToDictionary(item=>item.Name, item=>item.Captures.OfType<Capture>().ToList())
let names = row.ContainsKey("name")? row["name"] : null
let splitters = row.ContainsKey("splitters")? row["splitters"] : null
let values = row.ContainsKey("value")? row["value"] : null
where names != null && splitters != null &&
names.Count == splitters.Count &&
(values==null || values.Count <= splitters.Count)
select new {Names = names, Values = values}
);
var records = new List<Record>();
foreach(var row in rows)
{
for(int i=0; i< row.Names.Count; i++)
{
records.Add(new Record{Name=row.Names[i].Value, Value=i < row.Values.Count ? row.Values[i].Value : ""});
}
}
foreach(var record in records)
{
Console.WriteLine(record.Name + " = " + record.Value);
}
}
}
输出:
Marketing ID = GR332230
Local Number = 0000232323
Dispatch Code = GX3472
Logic code = 1
Destination ID = 3411144
Destination details =
请注意,这也适用于此类消息: 电子邮件从4月开始
Marketing ID Local Number
------------------- ----------------------
GR332230 0000232323
Dispatch Code Logic code
----------------- -------------------
GX3472 1
Destination ID Destination details
----------------- -------------------
3411144
输出:
Marketing ID = GR332230
Local Number = 0000232323
Dispatch Code = GX3472
Logic code = 1
Destination ID =
Destination details = 3411144
或者这个:
EMAIL STARTING IN APRIL
Marketing ID Local Number
------------------- ----------------------
Dispatch Code Logic code
----------------- -------------------
GX3472 1
Destination ID Destination details
----------------- -------------------
3411144
输出:
Marketing ID =
Local Number =
Dispatch Code = GX3472
Logic code = 1
Destination ID =
Destination details = 3411144