Question

对于我的项目，我正在解析多个JIRA票证，这些票证在描述字段中具有不同的文本格式（aka每个链接的顺序/数量可能会更改），并删除所有以特定URL开头的链接（例如： www.website.com/app/client =？................）。我正在使用Python来做到这一点。

我面临的问题是我不确定如何解析整个URL，因为长度/格式每次都会改变。我尝试使用re库并获得与URL特定部分匹配的内容（例如：直到“ / monitor”，但不包括“ /monitor-text.random12443”之后的部分）。

我应该改用urlparse库吗？如果可以，如何在不包含其余字符串的情况下识别不同的URL？

问题摘要：

如何解析多行字符串，以一定的起始顺序（例如：www.removethisurl.com/）标识多个链接，并删除/替换整个字符串，而不更改其余内容字符串？

任何人和所有帮助将不胜感激！

作为参考，下面是描述字段中的字符串示例：

h1. Title of this section
Random text 
more random text
.
.
.
h3. URL section
Random text
description of the URL | Url that does NOT need to be changed
description of the URL | Url that does NOT need to be changed

Random text:
description of the URL | Url that DOES need to be changed

Url that DOES need to be changed
.
.
.
h3. More text

more random URLS that DO NOT need to be changed

我尝试过的示例代码：

# Parse through each issue in the list
for issue in range(totalComp):  
    # is a dict with the following format {'customfield..' : 'URL'}
    dataToChange = json_data_comp['issues'][issue]['fields']  
    print("Issue {} has the following componentfield_11111 entry: {}".format(issue, dataToChange))
    testStr = json_data_comp['issues'][issue]['fields']['customfield_15462']

    x = urlparse(testStr)
    print(f'X fragment is: {x.fragment}')
    rest = re.search('[^\s]+', x.fragment)  # Selects up until the whitespace
    print(f'X fragment is: {rest}')

示例输出：

Issue 23 has the following componentfield_11111 entry: {'customfield_15462': 'Production: https://www.URLtoChange.com/sv.do?id=KTy9VBJe2rgWug1BNFbCJYnyuE37TMvtPzQQbJiQpCAGtN9msOPrrcYb4DvsQiY%2FUR5WD%2FXshycb%0AxrwvH6CoPHoiIFDuVU4z      Preview Texts: https://URL.to.NOT.change.com/?filter=HCPms'}

X fragment is: 

X fragment is: None

解析并替换多行字符串中的多个URL

0 个答案: