Python如何将单引号转换为双引号以格式化为json字符串

时间:2017-12-05 17:55:16

标签: python json regex double-quotes single-quotes

我有一个文件,每行我都有这样的文字(代表电影的演员):

int main(void)
{
    int b = isatty(STDOUT_FILENO);
    int i;
    for(i=1; i<=100; i++) {
        if (b) printf("%d%%\r", i);
        fflush(stdout);
        usleep(1e4);
    }
    if (!b) printf("%d%%\r", i-1);
    printf("\n");
    return 0;
}

我需要在有效的json字符串中转换它,因此只将必要的单引号转换为双引号(例如,单词Verbal周围的单引号不得转换,文本中的最终撇号也不应转换)。 / p>

我正在使用python 3.x.我需要找到一个正则表达式,它只会将正确的单引号转换为双引号,因此整个文本会生成一个有效的json字符串。任何的想法?

4 个答案:

答案 0 :(得分:3)

首先,您作为示例提供的行不可解析! … 'Edie's Finneran' …包含语法错误,无论如何。

假设您可以控制输入,只需使用eval()读取文件即可。 (虽然,在这种情况下,人们会想知道为什么你不能首先生成有效的JSON ......)

>>> f = open('list.txt', 'r')
>>> s = f.read().strip()
>>> l = eval(s)

>>> import pprint
>>> pprint.pprint(l)
[{'cast_id': 23,
  'character': "Roger 'Verbal' Kint",
  ...
  'profile_path': '/b1pjkncyLuBtMUmqD1MztD2SG80.jpg'}]

>>> import json
>>> json.dumps(l)
'[{"cast_id": 23, "character": "Roger \'Verbal\' Kint", "credit_id": "52fe4260ca36847f8019af7", "gender": 2, "id": 1979, "name": "Kevin Spacey", "order": 5, "rofile_path": "/x7wF050iuCASefLLG75s2uDPFUu.jpg"}, {"cast_id": 27, "character":"Edie\'s Finneran", "credit_id": "52fe4260c3a36847f8019b07", "gender": 1, "id":2179, "name": "Suzy Amis", "order": 6, "profile_path": "/b1pjkncyLuBtMUmqD1MztDSG80.jpg"}]'

如果你无法控制输入,这是非常危险的,因为它会打开代码注入攻击。

我无法强调,最好的解决方案是首先生成有效的JSON。

答案 1 :(得分:0)

以下是获得所需输出的代码

import ast
def getJson(filepath):
    fr = open(filepath, 'r')
    lines = []
    for line in fr.readlines():
        line_split = line.split(",")
        set_line_split = []
        for i in line_split:
            i_split = i.split(":")
            i_set_split = []
            for split_i in i_split:
                set_split_i = ""
                rev = ""
                i = 0
                for ch in split_i:
                    if ch in ['\"','\'']:
                        set_split_i += ch
                        i += 1
                        break
                    else:
                        set_split_i += ch
                        i += 1
                i_rev = (split_i[i:])[::-1]
                state = False
                for ch in i_rev:
                    if ch in ['\"','\''] and state == False:
                        rev += ch
                        state = True
                    elif ch in ['\"','\''] and state == True:
                        rev += ch+"\\"
                    else:
                        rev += ch
                i_rev = rev[::-1]
                set_split_i += i_rev
                i_set_split.append(set_split_i)
            set_line_split.append(":".join(i_set_split))
        line_modified = ",".join(set_line_split)
        lines.append(ast.literal_eval(str(line_modified)))
    return lines
lines = getJson('test.txt')
for i in lines:
    print(i)

答案 2 :(得分:0)

除了 eval()(在user3850的答案中提到)之外,您还可以使用 ast.literal_eval

这已在线程Using python's eval() vs. ast.literal_eval()?

中进行了讨论

您还可以查看Kaggle竞赛的以下讨论主题,这些主题的数据类似于OP提到的数据:

https://www.kaggle.com/c/tmdb-box-office-prediction/discussion/89313#latest-517927 https://www.kaggle.com/c/tmdb-box-office-prediction/discussion/80045#latest-518338

答案 3 :(得分:0)

如果您无法控制 JSON 数据,不要money = money + net gain

我创建了一个简单的 JSON 校正机制,因为它更安全:

eval()

您可以通过以下方式使用该功能:

def correctSingleQuoteJSON(s):
    rstr = ""
    escaped = False

    for c in s:
    
        if c == "'" and not escaped:
            c = '"' # replace single with double quote
        
        elif c == "'" and escaped:
            rstr = rstr[:-1] # remove escape character before single quotes
        
        elif c == '"':
            c = '\\' + c # escape existing double quotes
   
        escaped = (c == "\\") # check for an escape character
        rstr += c # append the correct json
    
    return rstr