在我的JSON数据中,有特殊的Unicode字符,例如“ \ u00E0”,“ \ u00FB” 等。我想用普通字母替换所有此类字符。例如,“ \ u00E0” 和“à” ,“ \ u00FB” 和“û” 。这样的字符很多,所以我会找到它们的列表。
我已经尝试用
替换它FR_TOKEN
和
dictData.encode('utf-8').decode('unicode_escape')
但是两者都没有用。这是我的代码
dictData.replace("\u00E0", "à").replace("\u00E4", "ä").replace("\u00E2", "â").replace("\u00E7", "ç").replace("\u00E8", "è").replace("\u00E9", "é").replace("\u00EA", "ê").replace("\u00EB", "ë").replace("\u00EE", "î").replace("\u00EF", "ï").replace("\u00F4", "ô").replace("\u00F6", "ö").replace("\u00F9", "ù").replace("\u00FB", "û").replace("\u00FC", "ü").replace("\u00FC", "ù").replace("\u00FB", "û").replace("\u00FC", "î").replace
这是JSON数据的示例:
import re, sys
from urllib.request import urlopen
import json
patternScript = re.compile("""<script>window\.\_\_WEB\_CONTEXT\_\_\=\{pageManifest:(.*?)\};</script>""", re.DOTALL)
with urlopen("https://www.tripadvisor.ch/Hotel_Review-g188113-d228146-Reviews-Coronado_Hotel-Zurich.html") as response:
source = str(response.read())
dictData = patternScript.search(source).group(1).replace('\\ "', '\\"').replace('\\"', '\"').replace("\'", "\\'")
dictData2 = dictData.encode('utf-8').decode('unicode_escape')
#dictData2 = dictData.replace("\u00E0", "à").replace("\u00E4", "ä").replace("\u00E2", "â").replace("\u00E7", "ç").replace("\u00E8", "è").replace("\u00E9", "é").replace("\u00EA", "ê").replace("\u00EB", "ë").replace("\u00EE", "î").replace("\u00EF", "ï").replace("\u00F4", "ô").replace("\u00F6", "ö").replace("\u00F9", "ù").replace("\u00FB", "û").replace("\u00FC", "ü")
jsonData = json.loads(dictData2)
您有什么建议吗?预先谢谢你!
答案 0 :(得分:0)
希望它对您有用:
with urlopen("https://www.tripadvisor.ch/Hotel_Review-g188113-d228146-Reviews-Coronado_Hotel-Zurich.html") as response:
source = response.read()
dictData = patternScript.search(source.decode("utf-8")).group(1)
jsonData = json.loads(dictData)