Question

我编写了下面的代码来解析具有以下格式的文件。

请告诉我是否可以清理代码，最好是编写lambda表达式而不是for循环。

方法中的代码已经使用StreamReader类在StringBuilder变量中包含了文件的内容。

格式的

代码的
来自调用函数的解析器等于＆＃34;其他总计＆＃34;

b = np.zeros((192,192,1))
print(b.shape)  #(192, 192, 1)
print(np.array([b]).shape) #(1, 192, 192, 1)

返回的字符串如下所示：
页面[0] 的

第[1]页

Answer 1

似乎你可以使用IndexOf的重载，该过载采用先前的起始位置来循环遍历在＆＃34;其他总计：＆＃34;之后的第一个换行符处的字符串。字符串。

private List<string> ParseObject(StringBuilder body, string parser)
{
    List<string> pages = new List<string>();

    string data = body.ToString();
    int splitPos = 0;
    int startPos = 0;
    while (true)
    {
        // Search the position of the parser string starting from the
        // previous found pos
        int parserPos = data.IndexOf(parser, splitPos);
        if (parserPos != -1)
        {
            // Now we search the position of the newline after the 
            // found parser pos
            splitPos = data.IndexOf(Environment.NewLine, parserPos);

            // Take the substring starting from the previous position up to
            // the current newline position 
            pages.Add(data.Substring(startPos, splitPos - startPos).Trim());

            // reposition to the new starting position for IndexOf
            startPos = splitPos;
        }
        else
            break;
    }
    return pages;
}

你用

来称呼它

var result = ParseObject(input, "Other Total:");

请注意，您应该返回页面列表，否则调用无效

Answer 2

正则表达式方法：

测试：https://regex101.com/r/HP1ufT/1/

List<string> pages = new List<string>();

string pattern = @"^(.*?Other Total:.*?)$";
MatchCollection matches = Regex.Matches(lines, pattern,
  RegexOptions.Singleline | RegexOptions.Multiline);

foreach (Match match in matches)
{
    pages.Add(match.Groups[1].Value);
}

解析指定字符串上的文件

2 个答案: