对于我的项目,我正在解析多个JIRA票证,这些票证在描述字段中具有不同的文本格式(aka每个链接的顺序/数量可能会更改),并删除所有以特定URL开头的链接(例如: www.website.com/app/client =?................)。 我正在使用Python来做到这一点。
我面临的问题是我不确定如何解析整个URL,因为长度/格式每次都会改变。 我尝试使用re库并获得与URL特定部分匹配的内容(例如:直到“ / monitor”,但不包括“ /monitor-text.random12443”之后的部分)。
我应该改用urlparse库吗?如果可以,如何在不包含其余字符串的情况下识别不同的URL?
问题摘要:
如何解析多行字符串,以一定的起始顺序(例如:www.removethisurl.com/)标识多个链接,并删除/替换整个字符串,而不更改其余内容字符串?
任何人和所有帮助将不胜感激!
作为参考,下面是描述字段中的字符串示例:
h1. Title of this section
Random text
more random text
.
.
.
h3. URL section
Random text
description of the URL | Url that does NOT need to be changed
description of the URL | Url that does NOT need to be changed
Random text:
description of the URL | Url that DOES need to be changed
Url that DOES need to be changed
.
.
.
h3. More text
more random URLS that DO NOT need to be changed
我尝试过的示例代码:
# Parse through each issue in the list
for issue in range(totalComp):
# is a dict with the following format {'customfield..' : 'URL'}
dataToChange = json_data_comp['issues'][issue]['fields']
print("Issue {} has the following componentfield_11111 entry: {}".format(issue, dataToChange))
testStr = json_data_comp['issues'][issue]['fields']['customfield_15462']
x = urlparse(testStr)
print(f'X fragment is: {x.fragment}')
rest = re.search('[^\s]+', x.fragment) # Selects up until the whitespace
print(f'X fragment is: {rest}')
示例输出:
Issue 23 has the following componentfield_11111 entry: {'customfield_15462': 'Production: https://www.URLtoChange.com/sv.do?id=KTy9VBJe2rgWug1BNFbCJYnyuE37TMvtPzQQbJiQpCAGtN9msOPrrcYb4DvsQiY%2FUR5WD%2FXshycb%0AxrwvH6CoPHoiIFDuVU4z Preview Texts: https://URL.to.NOT.change.com/?filter=HCPms'}
X fragment is:
X fragment is: None