解决方案

Question

我想我已经在标题中写了我想要做的事情，所以现在到了这一点：

我有一个带有url链接的.txt文件，其源代码将由regex表达式解析。

每条链接的源代码都被这样删除：

public static string getSourceCode(string url)
{
    HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
    HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
    StreamReader sr = new StreamReader(resp.GetResponseStream());
    string sourceCode = sr.ReadToEnd();
    sr.Close();
    resp.Close();
    return sourceCode;
}

每个源代码都包含以下文字：

..code..
..code..
    <p class="content">

                                exampleexampleexample

                                        </p>
..code..
..code..
    <p class="content">

                                example

                                        </p>
..code..
..code..

content元素有更多元素。

我得到content内容：

Regex k = new Regex(@"<p class=""question-content"">[\r\n\s]*(\S.*)"); var g = k.Matches(sourceCode);

现在我可以轻松地提取每场比赛：

g[1].ToString() <-- first match
g[2].ToString() <-- second match
g[3].ToString() <-- thirdmatch

等

但我想要做的是提取以下链接：第一次匹配不包含XYZ，但至少在其他匹配中有XYZ。

例如：

第一个链接的源代码在第一个和第三个匹配中包含XYZ＆lt; - 错误

第二个链接的源代码仅在第一场比赛中包含XYZ＆lt; - wrong

第三个链接的源代码仅在第三场比赛中包含XYZ＆lt; - 成功！

解决方案

我得到了每场比赛的收集：

MatchCollection b1 = Regex.Matches(sourceCode, @"<p class=""content"">[\r\n\s]*(\S.*)");

接下来我要做的是检查第一场比赛是否包含＆＃34;示例＆＃34;通过这个：

if (!b1[0].ToString().Contains("example"))

检查此功能的结果：

bool checkAnother(int amount, MatchCollection m)
{     
    for (int i=1; i<=amount-1; i++)
    {
        if (m[i].ToString().Contains("example"))
            return true;
    }
    return false;
}

这就是代码：

            MatchCollection b1 = Regex.Matches(sourceCode, @"<p class=""content"">[\r\n\s]*(\S.*)");

            if ((!b1[0].ToString().Contains("example")) && (checkAnother(b1.Count, b1)))
            {dataGridView1.Rows[i].Cells[2].Value = "GOOD";                   
            }

Answer 1

您尝试做的不适合正则表达式。

可能可能具有多线匹配，捕获组和环顾四周，但是IMO不值得将大量精力投入到无法维护的解决方案中。

尝试在后处理步骤中验证找到的匹配项。假设你像这样抓住比赛：

var g = k.Matches(sourceCode);

...你可以通过以下方式轻松实现这一目标：

var isFirstOk = !g[0].Value.Contains("XYZ");
var areAllOk = isFirstOk && g.Cast<Match>().Skip(1).Any(m => m.Value.Contains("XYZ"));

仅在存在匹配时返回true，但它不是第一个匹配

解决方案

1 个答案: