Question

我在google上搜索没有运气，我正在尝试以这种格式提取链接：

<cite class=Rm>https://www.example.com/<b>index</b>.<b>php</b>?<b>username</b>=laura</cite>

这将是结果：https://www.example.com/index.php?username=laura

Answer 1

如果您不需要配对开始和结束标记，您可以删除标记。

string input = "<cite class=Rm>https://www.example.com/<b>index</b>.<b>php</b>?<b>username</b>=laura</cite>";
string pattern = "<[^>]*>";
string replacement = "";
string result = System.Text.RegularExpressions.Regex.Replace(input, pattern, replacement);

否则您需要使用balancing groups。我不知道如何一步完成它，但你可以试试这个：

    string input = "<cite class=Rm>https://www.example.com/<b>index</b>.<b>php</b>?<b>username</b>=laura</cite>";
    string pattern = "(?'open'<(?'tag'[^ ]*)[^>]*>)(?'middle'.*?)(?'close-open'</\\k'tag'>)";
    string replacement = "${middle}";
    string step1 = System.Text.RegularExpressions.Regex.Replace(input, pattern, replacement);
    string result = System.Text.RegularExpressions.Regex.Replace(step1, pattern, replacement);

Answer 2

将来，更多示例和语法的一致性非常有用。

这个正则表达式假定您尝试捕获的每一行都遵循这种格式（星号当然是通配符）。

<cite class=Rm>*<b>*</b>.<b>*</b>?<b>*</b>=*</cite>

这是正则表达式

<cite class=Rm>(.*?)<b>(.*?)<\/b>\.<b>(.*?)<\/b>\?<b>(.*?)</b>=(.*?)</cite>

匹配模式就像（对不起，我对C＃帮不了多少）

$1$2.$3?$4=$5

复杂的html匹配，正如评论所指出的，是一个建在沙滩上的房子。最好使用解析器进行更复杂的匹配，因为那时诸如属性的顺序，元素的存在等等，只有在你需要它们时才重要。

虽然这不是一场复杂的比赛，但我希望你能在未来的努力中牢记这一点。

正则表达式将url嵌入到html标签中

2 个答案: