以下是一些真实的样本数据:
string s1 = "CLR DRBR|r 0004 BLCK|r 0006 WHIT|r 0006"
string s2 = "WGT WHGN|c 0004 YLGN|c 0006"
string s3 = "296 312|d 0004 137.2|n 0006"
string s4 = "HGT SH|r 0004"
string s5 = "ANLP ANLP1 PNPL|r 0004"
数据始终采用以下格式:[Group] [Value][Pipe + letter][Key]
,[Value][Pipe + letter][Key]
部分可能会重复多次。
有什么办法可以将这类数据分成以下几种:
string out1[] = { "CLR", "DRBR", "|r 0004", "BLCK", "|r 0006", "WHIT", "|r 0006" }
string out2[] = { "WGT", "WHGN", "|c 0004", "YLGN", "|c 0006" }
string out3[] = { "296", "312", "|m 0004", "137.2", "|n 0006" }
string out4[] = { "HGT", "SH", "|r 0004" }
string out5[] = { "ANLP", "ANLP1 PNPL", "|r 0004" }
请注意,s5的数据模式与其他模式略有不同。
这些是20世纪60年代的遗留数据,所以请不要问我这样做/为什么以这种方式存储数据。谢谢。
答案 0 :(得分:1)
查看数据,您似乎有以下规则:
Phase 1 : Read to first space and split and remove space.
Phase 2 : Read to `|` and split prior to `|`.
Phase 3 : Include `|` and next 3 characters (space) and read to next space or EOT split and remove space if exists.
Goto Phase 2 if more data.
像这样(你可能想要比我输入更多的错误检查):
void Main()
{
string s1 = "CLR DRBR|r 0004 BLCK|r 0006 WHIT|r 0006";
string s2 = "WGT WHGN|c 0004 YLGN|c 0006";
string s3 = "296 312|d 0004 137.2|n 0006";
string s4 = "HGT SH|r 0004";
string s5 = "ANLP ANLP1 PNPL|r 0004" ;
splitit(s1).Dump();
}
string [] splitit(string input)
{
List<string> output = new List<string>();
int index = 0;
// phase one
while (input[index] != ' ') index++;
output.Add(input.Substring(0,index));
// skip space
while (input[index] == ' ') index++;
int indexTmp = index;
do
{
// phase two
while (input[index] != '|') index++;
output.Add(input.Substring(indexTmp,(index)-indexTmp));
// phase three
indexTmp = index;
index = index + 3; // save | code and space
while ((input[index] != ' ') && index < (input.Length-1)) index++;
output.Add(input.Substring(indexTmp,(index)-indexTmp));
// skip spaces
while (input[index] == ' ') index++;
indexTmp = index;
} while(index < input.Length-1);
return output.ToArray();
}
答案 1 :(得分:0)
你有一个接受的答案,但只要你说我的方式不行,这就是我的意思:
int index;
List<string[]> output = new List<string[]>();
List<string> current = null;
string[] fields;
//i imagine this will be in an array when you read it in from a file
string[] input = new string[5];
input[0] = "CLR DRBR|r 0004 BLCK|r 0006 WHIT|r 0006";
input[1] = "WGT WHGN|c 0004 YLGN|c 0006";
input[2] = "296 312|d 0004 137.2|n 0006";
input[3] = "HGT SH|r 0004";
input[4] = "ANLP ANLP1 PNPL|r 0004";
现在,您只需循环处理第一个记录,然后检查后续记录是否出现第二个空格并正确处理。
bool first = true;
//loop through each of the input records
foreach (string record in input)
{
//split the input records based on the pipe character
fields = record.Split("|".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
//loop through each of the fields
foreach (string field in fields)
{
if (first) //split the first field based on the first space in field
{
current = new List<string>();
index = field.IndexOf(" ");
current.Add(field.Substring(0, index).Trim());
current.Add(field.Substring(index + 1).Trim());
first = false;
}
else //split subsequent records based on second space if it exists
{
index = field.IndexOf(" ", 3);
if (index == -1)
{
current.Add("|" + field);
}
else
{
current.Add("|" + field.Substring(0, index).Trim());
current.Add(field.Substring(index + 1).Trim());
}
}
}
//control break processing
first = true;
output.Add(current.ToArray());
}
您可以轻松地将内部循环修改为另一个函数。如果您测试我认为这会更快。