Question

在stackoverflow中搜索后，我发现了这个正则表达式模式：

<a href="/site/file1.doc?id=1">link1</a>

它获取所有hrefs`值

现在我需要限制该模式以仅获取doc或docx值

请注意，链接可能会以.docx或.doc

之后的添加结尾

例如，如果我有链接：

/site/file1.doc

结果应该是：

XDocument testing = XDocument.Load(testXMLPath);
var xlWarranty = (from warranty in testing.Descendants("Warranty")
                    select new
                    {
                        Service = (string)warranty.Element("ServiceLevelDescription").Value,
                        Provider = (string)warranty.Element("ServiceProvider").Value,
                        StartDate = (string)warranty.Element("StartDate").Value,
                        EndDate = (string)warranty.Element("EndDate").Value,
                        TypeOfWarranty = (string)warranty.Element("EntitlementType").Value
                    }).ToList();

感谢。

Answer 1

试试这个：

/href=(['"])([^'"]+\.docx?(\?[^'"]*)?)\1/

这需要＆＃34; .doc＆＃34;之后的内容。或＆＃34; .docx＆＃34;是href的结尾，或是一个问号后跟的东西，即它不会匹配＆＃34; foo.doctor＆＃34;。

这也确保了引号在每一端通过后引用匹配。

请参阅live demo。

Answer 2

/href=['"]([^'"]+?\.docx?)[^'"]['"]/

在此处查看：https://regex101.com/r/oS1cD0/2

正则表达式只从href获取docx或doc值

2 个答案: