json转换为csv,simplejson.errors.JSONDecodeError:期望值:第1行第1列(字符0)

时间:2019-02-14 11:38:48

标签: python json csv yelp

我遵循了这个git https://github.com/ajmanser/Yelp,但一切正常,但是当我尝试从头开始训练模型时,我陷入了step2:

function auto_tinymce_fix( $init ) {
// html elements being stripped
$init['extended_valid_elements'] = 'div[*], article[*]';

// don't remove line breaks
$init['remove_linebreaks'] = false;

// convert newline characters to BR
$init['convert_newlines_to_brs'] = true;

// don't remove redundant BR
$init['remove_redundant_brs'] = false;

// pass back to wordpress
return $init;
}
add_filter('tiny_mce_before_init', 'auto_tinymce_fix');

,用我被此错误困扰的脚本将json转换为csv。而且我不知道问题出在哪里。

Use the json_converter.py script on the business and review datasets to convert them into csv files. This script requires Python version 2 and simple json (I took this from another repo and made a few quick attempts to get it working with Python 3, but it was becoming a bottleneck for me and it works fine if you use Python 2 + pip2 install simplejson).

测试json

Traceback (most recent call last):
  File "json_converter.py", line 115, in <module>
    column_names = get_superset_of_column_names_from_file(json_file)
  File "json_converter.py", line 28, in get_superset_of_column_names_from_file
    line_contents = json.loads(line)
  File "D:\Python27\lib\site-packages\simplejson\__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "D:\Python27\lib\site-packages\simplejson\decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "D:\Python27\lib\site-packages\simplejson\decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
  File "D:\Python27\lib\site-packages\simplejson\scanner.py", line 79, in scan_once
    return _scan_once(string, idx)
  File "D:\Python27\lib\site-packages\simplejson\scanner.py", line 70, in _scan_once
    raise JSONDecodeError(errmsg, string, idx)
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

我还从yelp下载了一个数据集,在github中,这是他们使用的数据

代码转换器

[
    {
        "review_id": "1",
        "business_id": "1",
        "stars": 5,
        "text" : "It was so much fun to read that I readed it again"
    },
 {
        "review_id": "2",
        "business_id": "1",
        "stars": 5,
        "text" : "A classic How can you not like this one? The characters are very memorable, and we all liked it."
    },
 {
        "review_id": "3",
        "business_id": "2",
        "stars": 5,
        "text" : " pretty nice story. and  very interesting characters"
    },
 {
        "review_id": "4",
        "business_id": "1",
        "stars": 5,
        "text" : "Awesome! for children and a time travel for elders, really a simple language and beautiful descriptions makes the work very interesting."
    },
{
        "review_id": "5",
        "business_id": "1",
        "stars": 5,
        "text" : "A fascinating read for anyone who would think to breed a horse for want of a another for whatever purpose that desired and so realize that the blood line means little if the sire or dame should not be suited for breeding purposes in case they should pass on unwanted traits"
    },
{
        "review_id": "6",
        "business_id": "1",
        "stars": 5,
        "text" : "The Arabian Nights I read when I was young were like disney-fied. I'm excited to read the real version of the tales."
    },
{
        "review_id": "7",
        "business_id": "2",
        "stars": 5,
        "text" : "Just a string of short boring stories. It looks like some Sindbad is also in there, but I got bored before I got to it."
    }
]

1 个答案:

答案 0 :(得分:1)

您正在将文件的每一行发送到json.loads,这会导致错误。

json.loads()期望整个json字符串,因此您必须使用fin.read()使用整个文件内容,并将其发送到json.loads(),请参见以下解决方案:

def get_superset_of_column_names_from_file(json_file_path):
    """Read in the json dataset file and return the superset of column names."""
    column_names = set()
    with open(json_file_path) as fin:
        line_contents = json.loads(fin.read())
        column_names.update(
                set(get_column_names(line_contents).keys())
                )
    return column_names