我正在尝试将其转换为字符串后,使用Python(而非C#,PHP或其他工具)从网址中删除%20符号。但是无论我尝试使用哪种格式,符号都保持不变。
这是我尝试的代码:
url = 'https://www.amazon.com/s?k=hbb%20magic%20dress' # Type string
title_text_data_file = url.split('=')[1]
if '%20'in title_text_data_file:
title_text_data_file = title_text_data_file.replace('%20+', '')
keyword = title_text_data_file.replace('+', ' ')
title_text_data_file = title_text_data_file + ".txt"
print('Keyword:',keyword,'- File title:',title_text_data_file,'- URL:',url)
这就是我得到的:
Keyword: hbb%20magic%20dress - File title: hbb%20magic%20dress.txt - URL: https://www.amazon.com/s?k=hbb%20magic%20dress
这就是我想要得到的:
Keyword: hbb magic dress - File title: hbb+magic+dress.txt - URL: https://www.amazon.com/s?k=hbb%20magic%20dress
答案 0 :(得分:3)
实际上,最好使用旨在处理url的库,因为该库将处理所有使用Urlencoded的字符,而不仅仅是空格(%20
)。标准库提供了urllib.parse
模块。
您要使用
import urllib.parse
url = 'https://www.amazon.com/s?k=hbb%20magic%20dress'
# This extracts the query part from the url
query = urllib.parse.urlparse(url).query
# This gets the first k parameter, decoding any urlencoded character, not only spaces(%20)
keyword = urllib.parse.parse_qs(query)['k'][0]
答案 1 :(得分:0)
Python Urllib.parse模块可用于转换编码的url。
示例
import urllib.parse
url = 'https://www.amazon.com/s?k=hbb%20magic%20dress' # Type string
urllib.parse.unquote(url) # Returns 'https://www.amazon.com/s?k=hbb magic dress'
urllib.parse.unquote(url).replace(" ","") # Returns 'https://www.amazon.com/s?k=hbbmagicdress'
答案 2 :(得分:-1)
str.replace(old,new [,max])
您不能替换不存在的字符串。
title_text_data_file = url.split('=')[1]
if '%20'in title_text_data_file:
key = '%20'
title_text_data_file = title_text_data_file.replace(key, '+')
keyword = title_text_data_file.replace('+', ' ')
title_text_data_file = title_text_data_file + ".txt"
print('Keyword:',keyword,'- File title:',title_text_data_file,'- URL:',url)
Keyword: hbb magic dress - File title: hbb+magic+dress.txt - URL: https://www.amazon.com/s?k=hbb%20magic%20dress