Question

...人

我有一个很棒的HTML，我希望获取所有以

开头的链接

＆＃34; https://exampledomain.com/category/＆＃34;并删除其余部分，HTML有像＆＃34; https://exampledomain.com/edit/ ...＆＃34; ＆＃34; https://exampledomain.com/view/ ...＆＃34;，有标签，文字，我想删除所有但不是＆＃34; https://exampledomain.com/category/.../＆＃34;

最终结果必须如下：

extension String {
    func substring(location: Int, length: Int) -> String? {
        guard characters.count >= location + length else { return nil }
        let start = index(startIndex, offsetBy: location)
        let end = index(startIndex, offsetBy: location + length)
        return substring(with: start..<end)
    }
}

有什么想法吗？谢谢！：）

Answer 1

正如亚力克提出的那样，我已经使用搜索和替换来单独连接链接（使用扩展\ n）...

搜索：(https://www.exampledomain/category/[^"]*) 匹配所有链接直到（“）（href =”url“结束）
替换为：\n\n\1\n\n

完成后，我用notepad ++“CTFL + F＆gt; Mark”来选择包含所有行

https://www.exampledomain/category/

然后，删除没有标记的行...使用菜单＆gt;搜索＆gt;标记＆gt;删除没有选定的行...

谢谢！：d

Answer 2

你可以用这个：

环绕：：是的 查找： .*?"(https://www.exampledomain/category/.*?)"|.*
替换： \1\n
正则表达式：是的 .匹配换行符：是

点击全部替换

Notepad ++ Regex获取所有以HTML中的字符串开头的完整链接

2 个答案: