我正在Flask和Python自然语言工具包(NLTK)中编写一个小的词性应用程序来返回提交给Flask API的句子的词性。如果JSON提交的数据格式正确,这可以正常工作。但是,如果提交的数据在提交内容中包含嵌入式引号,我如何验证JSON数据并根据需要对其进行转换。
有效提交
{
"pos_submission":"This is a valid json format."
}
提交无效(但很可能会发生)
{
"pos_submission":"This is an "invalid" json format due to the embedded quotes."
}
代码:
from flask import Flask, jsonify
from nltk import pos_tag, word_tokenize
app = Flask(__name__)
@app.route('pos', methods=['POST'])
def pos():
data = request.json
data_submission = data["pos_submission"] # pull the sentence into variable
tokenized_submission = word_tokenize(data_submission) # tokenize sentence
pos_results = pos_tag(tokenized submission) # get part of speech results
pos_results=dict([('Data_Submission', data_submission),
('POS_Tags',pos_results)]) # create a pos dictionary object
return jsonify(pos_results) # return dictionary object as json
if __name__ == '__main__':
app.run(host='127.0.0.1', threaded=True, port=5000, debug=True)
这里有什么最好的选择来纠正提交中的嵌入式引号并返回词性?