所以我正在开发一个脚本,该脚本将自动从以json格式传递信息的Web服务下载和写入数据。他们是加拿大的政党,因此,重音人物经常出现。
例如,要访问代表“BlocQuébécois”一方的候选人的数据,我需要访问此网址:
不幸的是,用e替换é的简单解决方案不起作用。
所以我的脚本看起来像这样“
Microsoft.Xna.Framework.Content.ContentLoadException was unhandled
HResult=-2146233088
Message=Could not load board2 asset as a non-content file!
Source=MonoGame.Framework
StackTrace:
at Microsoft.Xna.Framework.Content.ContentManager.ReadAsset[T](String assetName, Action`1 recordDisposableObject)
at Microsoft.Xna.Framework.Content.ContentManager.Load[T](String assetName)
at MMCreate.Game1.LoadContent() in C:\Shri\CSProjects\GameProjects\MMCreate\Game1.cs:line 103
at Microsoft.Xna.Framework.Game.Initialize()
at MMCreate.Game1.Initialize() in C:\Shri\CSProjects\GameProjects\MMCreate\Game1.cs:line 89
at Microsoft.Xna.Framework.Game.DoInitialize()
at Microsoft.Xna.Framework.Game.Run(GameRunBehavior runBehavior)
at Microsoft.Xna.Framework.Game.Run()
at MMCreate.Program.Main() in C:\Shri\CSProjects\GameProjects\MMCreate\Program.cs:line 22
at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
InnerException:
HResult=-2146233088
Message=The content file was not found.
Source=MonoGame.Framework
StackTrace:
at Microsoft.Xna.Framework.Content.ContentManager.OpenStream(String assetName)
at Microsoft.Xna.Framework.Content.ContentManager.ReadAsset[T](String assetName, Action`1 recordDisposableObject)
InnerException:
FileName=C:\Shri\CSProjects\GameProjects\MMCreate\bin\Windows\Debug\Content\board2.xnb
HResult=-2147024894
Message=Could not find file 'C:\Shri\CSProjects\GameProjects\MMCreate\bin\Windows\Debug\Content\board2.xnb'.
Source=mscorlib
StackTrace:
at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share)
at Microsoft.Xna.Framework.TitleContainer.OpenStream(String name)
at Microsoft.Xna.Framework.Content.ContentManager.OpenStream(String assetName)
InnerException:
我知道这与utf-8编码有关,但是我很难绕过它,而我在这里和其他网站上找到的其他链接也无济于事。
我尝试在urlopen调用中添加.encode('utf-8'),如下所示:
import urllib
#party_name_list = ["Conservative", "Liberal", "NDP", "Green%20Party", "Bloc%20Québécois", "Forces%20et%20Démocratie", "Libertarian", "Christian%20Heritage"]
party_name_list = ["Bloc%20Québécois"]
for party_name in party_name_list:
with urllib.request.urlopen(r"https://represent.opennorth.ca/candidates/house-of-commons/?limit=1000&party_name={}".format(party_name)) as url:
with open(r"F:\electoral_map\20150914\candidates\candidates_{0}.json".format(party_name), "wb+") as f:
f.write(url.read())
print("finished {0}".format(party_name))
print("all done")
但是这只会使文件返回空,因为它现在调用url:
https://represent.opennorth.ca/candidates/house-of-commons/?limit=1000&party_name=b '阵营%20Qu \ XC3 \ xa9b \ XC3 \ xa9cois'
有人可以帮我理解如何弄清楚这个烂摊子吗?
答案 0 :(得分:1)
我解决了它,但我认为这不是最优雅的解决方案,说实话,我并没有完全理解它。也许有人可以更好地解释它,但使用urllib.parse.unquote_plus()帮助我:
xxd
答案 1 :(得分:1)
你正在混合苹果和橘子。用于表示字符串的字节如“Québécois”或“”取决于字符集和编码。 通常,现代网站将在URL中使用UTF-8,但不能保证。
在UTF-8(基本上所有其他现代编码)中,空间由一个字节0x20表示 - 这是您看到URL编码为%20
的内容。字符é(U+00E9)使用字节序列0xC3 0xA9进行编码(虽然注意它可以等效地分解为0x65 0xCC 0x81!)然后再次应用URL编码产生%C3%A9
。
但无论如何,就像你发现的那样,urllib
会为你很好地和透明地处理这个问题,所以你真的不需要理解上面的内容。我认为你在your own answer中提到的代码是正确和惯用的。
在一般情况下正确理解需要至少了解最常见的不同character encodings以及Unicode normalization。