我正在尝试从字符串中提取信息 - 特定的fortran格式字符串。字符串的格式如下:
F8.3, I5, 3(5X, 2(A20,F10.3)), 'XXX'
格式化字段由“,”和括号内的格式化组分隔,括号前面的数字表示格式化模式重复连续的次数。因此,上面的字符串扩展为:
F8.3, I5, 5X, A20,F10.3, A20,F10.3, 5X, A20,F10.3, A20,F10.3, 5X, A20,F10.3, A20,F10.3, 'XXX'
我正在尝试在C#中创建一些能够扩展符合该模式的字符串。我已经开始使用大量的开关和if语句,但我想知道我是不是以错误的方式去做?
我基本上想知道一些Regex wizzard是否认为正则表达式可以在一个整齐的一举中做到这一点?我对正则表达式一无所知,但如果这可以解决我的问题,我正在考虑花一些时间来学习如何使用它们......另一方面,如果正则表达式无法解决这个问题,那么我宁愿花我的时间看另一种方法。
答案 0 :(得分:1)
这必须适用于Regex :) 我已经扩展了我之前的示例,并且可以很好地测试您的示例。
// regex to match the inner most patterns of n(X) and capture the values of n and X.
private static readonly Regex matcher = new Regex(@"(\d+)\(([^(]*?)\)", RegexOptions.None);
// create new string by repeating X n times, separated with ','
private static string Join(Match m)
{
var n = Convert.ToInt32(m.Groups[1].Value); // get value of n
var x = m.Groups[2].Value; // get value of X
return String.Join(",", Enumerable.Repeat(x, n));
}
// expand the string by recursively replacing the innermost values of n(X).
private static string Expand(string text)
{
var s = matcher.Replace(text, Join);
return (matcher.IsMatch(s)) ? Expand(s) : s;
}
// parse a string for occurenses of n(X) pattern and expand then.
// return the string as a tokenized array.
public static string[] Parse(string text)
{
// Check that the number of parantheses is even.
if (text.Sum(c => (c == '(' || c == ')') ? 1 : 0) % 2 == 1)
throw new ArgumentException("The string contains an odd number of parantheses.");
return Expand(text).Split(new[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries);
}
答案 1 :(得分:0)
就个人而言,我建议使用递归函数。每次敲击左括号时,再次调用该函数来解析该部分。我不确定你是否可以使用正则表达式来匹配递归数据结构。
(编辑:删除了错误的正则表达式)
答案 2 :(得分:0)
我建议使用类似下面示例的重复方法(未经测试):
ResultData Parse(String value, ref Int32 index)
{
ResultData result = new ResultData();
Index startIndex = index; // Used to get substrings
while (index < value.Length)
{
Char current = value[index];
if (current == '(')
{
index++;
result.Add(Parse(value, ref index));
startIndex = index;
continue;
}
if (current == ')')
{
// Push last result
index++;
return result;
}
// Process all other chars here
}
// We can't find the closing bracket
throw new Exception("String is not valid");
}
您可能需要修改代码的某些部分,但我在编写简单编译器时使用了此方法。虽然还没有完成,只是一个例子。
答案 3 :(得分:0)
今天结束了改写。事实证明,这可以通过一种方法完成:
private static string ExpandBrackets(string Format)
{
int maxLevel = CountNesting(Format);
for (int currentLevel = maxLevel; currentLevel > 0; currentLevel--)
{
int level = 0;
int start = 0;
int end = 0;
for (int i = 0; i < Format.Length; i++)
{
char thisChar = Format[i];
switch (Format[i])
{
case '(':
level++;
if (level == currentLevel)
{
string group = string.Empty;
int repeat = 0;
/// Isolate the number of repeats if any
/// If there are 0 repeats the set to 1 so group will be replaced by itself with the brackets removed
for (int j = i - 1; j >= 0; j--)
{
char c = Format[j];
if (c == ',')
{
start = j + 1;
break;
}
if (char.IsDigit(c))
repeat = int.Parse(c + (repeat != 0 ? repeat.ToString() : string.Empty));
else
throw new Exception("Non-numeric character " + c + " found in front of the brackets");
}
if (repeat == 0)
repeat = 1;
/// Isolate the format group
/// Parse until the first closing bracket. Level is decremented as this effectively takes us down one level
for (int j = i + 1; j < Format.Length; j++)
{
char c = Format[j];
if (c == ')')
{
level--;
end = j;
break;
}
group += c;
}
/// Substitute the expanded group for the original group in the format string
/// If the group is empty then just remove it from the string
if (string.IsNullOrEmpty(group))
{
Format = Format.Remove(start - 1, end - start + 2);
i = start;
}
else
{
string repeatedGroup = RepeatString(group, repeat);
Format = Format.Remove(start, end - start + 1).Insert(start, repeatedGroup);
i = start + repeatedGroup.Length - 1;
}
}
break;
case ')':
level--;
break;
}
}
}
return Format;
}
CountNesting()
返回格式语句中括号嵌套的最高级别,但可以作为参数传递给方法。 RepeatString()
只重复指定次数的字符串,并将其替换为格式字符串中括号内的组。