我想解析Accept-Language
标题。我找到的所有答案都涉及解析字符串,但不处理格式错误的输入。
例如,如果用户发送此标头Accept-Language: en,es;q=0.5;*;q=0.5
,由于第二个;
格式错误,该怎么办?是否有任何软件包可以提供简单的解析和适当的异常提升?
答案 0 :(得分:1)
首先,您应该首先了解Accept-Language
标题的正确格式:https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4
您可以看到Accept-Language
标题字段的定义是:
Accept-Language = "Accept-Language" ":"
1#( language-range [ ";" "q" "=" qvalue ] )
language-range = ( ( 1*8ALPHA *( "-" 1*8ALPHA ) ) | "*" )
格式正确的标头的示例是:Accept-Language: da, en-gb;q=0.8, en;q=0.7
。您可以看到每个逗号,
分隔语言元组,其中每个语言元组为language-range
和quality weight
(可选)。
现在您知道如何定义Accept-Language
标头,唯一的问题是如何解析它。
您可以根据语言以多种方式实现此功能,但我会编写一个伪代码:
function parseAcceptLanguageHeader(headerValue):
parsedLanguages = []
languageStrings = headerValue.split(",")
foreach languageStrings as S do
parsedLanguages.add(parse(S))
return parsedLanguages
// Here we define parse(S)
function parse(S): // expecting format of S to be like: 'language-range [";q=<number>"]'
vals = S.trim().split(";") // remove leading and trailing spaces and split by ;
if vals.length == 1: // means 'q=qvalue' part is missing
return vals[0].trim(), 1.0 // default q is 1.0; you can additionally verify that vals[0] is one of the languages that you support
else if vals.length == 2:
return vals[0].trim(), parseQuality(vals[1])
else raise an error ("Expected two tokens but, got: " + S)
// Implement parse quality
function parseQuality(S):
// We expect to see 'q=<number>'
vals = q.split("=")
if (vals.length != 2):
raise an error ("Expected exactly two tokens for quality, but got: " + S)
else if (vals[0] != 'q'):
raise an error ("Expected quality (q) but got: " + S)
else
return parseInt(vals[1].trim()) // This can also throw an error, but I am not going to write implementation for that function
请注意,根据处理错误的语言不同。