Python 2.7 json.load()错误编码的变音字符

时间:2015-02-04 16:37:37

标签: python json mongodb

我从请求中获取字符串,其中包含字符ñ。 json.load()之前的这个字符的代码是\ xc3 \ xb1之后,我的字符串转换为dict,当前字符的代码为u'\ xf1。我无法在mongodb中插入此字符。

strings in documents must be valid UTF-8:

但如果我尝试保存munually \ xc3 \ xb1而不是u'\ xf1,那么所有都保存得很好。

整个代码

 try:
    # Load params arriving as json data
    enc = 'UTF-8'  
    print 'lal'
    params = json.loads(request.get_data())
    print params
    # Check all parameters
    customer_id = params.get('customer', '')
    check_credentials(customer_id, params.get('apikey', ''))
    collection_id = params.get('collection', '')
    if not collection_id or not str(collection_id).isdigit():
        raise Exception, "Invalid collection"
    train_records = params.get('train', [])         
    if not train_records:
        raise Exception, "Train records are needed in the 'train' parameter"
    # Store the trained classifier in database for a better performance
    train_records = map(lambda x: x.values(), train_records)
    cl = NaiveBayesClassifier(train_records)
    pk = '%s__%i' % (customer_id, collection_id)
    data = {'_id': pk, 'customer': customer_id, 'collection': collection_id, 'classifier': pickle.dumps(cl), 'train':train_records}
    if db.classifiers.find_one({'_id': pk}):           
        db.classifiers.update({'_id': pk}, data)
    else:
        db.classifiers.insert(data)
    # Asyncronously increase usage count in order to check rate limits
    gevent.spawn(increase_usage, customer_id)

except Exception as e:
    print e

json请求

{
  "apikey": "yt1uy23123123123",
  "customer": 111111,
  "collection": 111111,
  "train": [
    {
      "text": "ñ",
      "label": "pos"
    }
  ]
}

0 个答案:

没有答案