将带有数组的键值字符串转换为 python

时间:2021-03-05 10:52:04

标签: python arrays json algorithm dictionary

我有一个(平面)文本字符串,我想将其翻译成 python 字典/json。

示例字符串:

key1=value key2="val ue" key3=[entry1, entry2] key4=["o ne", "[two]"] key5="value with a , or secial character#l" key6="text with a protected quotation \" inside" key7=1,101,42

输出应该是一个 dict/json 看起来像

{
"key1": "value",
"key2": "val ue",
"key3": ["entry1", "entry2"],
"key4": ["o ne", "[two]"],
"key5": "value with a , or secial character#l",
"key6":"text with a protected quotation \" inside",
"key7": [1,101,42]
}

我正在使用此处描述的词法分析器 https://www.debugcn.com/en/article/15212391.html 但我坚持如何将它与括号一起使用...

    def parse_kv_pairs(text):
        lexer = shlex.shlex(text, posix=True)
        lexer.whitespace = " "
        lexer.wordchars += "="
        lexer.quotes = "\""
        lexer.wordchars += ".-_()/:+*^&%$#@!?|{}[]'`´,"
        return dict(word.split(value_sep, maxsplit=1) for word in lexer)

您是否知道支持此功能的库或您是否有能够翻译此功能的算法?

我很高兴任何成功:)

2 个答案:

答案 0 :(得分:0)

使用正则表达式我试图理解你想要什么。我像示例中一样坚持使用所有小写字母,并添加了几个我自己的额外问题键用于测试。

我认为数字中的任何逗号都可以被剥离并将任何空白字符编码为等效于一个空格,从而允许在空格处使用额外的换行符而不是长输入分割输入,(或者不 - 它可以被删除)。代码运行,最后的断言显示了它产生的结果。

列表不能嵌套。

# -*- coding: utf-8 -*-
"""
https://stackoverflow.com/questions/66491209/translate-key-value-string-with-arrays-into-json-object-in-python

Created on Fri Mar  5 18:52:01 2021

@author: paddy3118
"""
import re

data = r"""
key1=value key2="val ue" key3=[entry1, entry2] key4=["o ne", "[two]"]
key5="value with a , or secial character#l"
key6="text with a protected quotation \" inside" key7=1,101,42
key8=key9 key10="not a key0=whatewver"
"""
data = data.strip()
space = '\t \n\r'
i = 0
state = 'KEY'
d = {}  # dict for parsed data
while data:
    if state == 'KEY':
        if not (m := re.search(r'^([a-z0-9]+)=', data)):
            break  # d, data
        key = m.groups()[0]
        data = data[m.end():]
        state = 'VAL'
    if state in {'VAL', 'LISTVAL'}:
        if (m:= re.search('^([a-z][a-z0-9]+)[\s,]*', data)):
            val = m.groups()[0]
            if state == 'VAL':
                d[key] = val
                state = 'KEY'
            else:
                listval.append(val)
            data = data[m.end():]
        elif (m:= re.search(r'^"(.*?[^\\])"[\s,]*', data)):
            val = m.groups()[0]
            if state == 'VAL':
                d[key] = val
                state = 'KEY'
            else:
                listval.append(val)
            data = data[m.end():]
        elif (m:= re.search(r'^([0-9][0-9.,]*[^,])[\s,]*', data)):
            val = m.groups()[0]
            val = float(val.replace(',', ''))
            val = int(val) if val.is_integer() else val
            if state == 'VAL':
                d[key] = val
                state = 'KEY'
            else:
                listval.append(val)
            data = data[m.end():]
        elif state == 'VAL' and data[0] == '[':
            listval = []
            state = 'LISTVAL'
            data = data[1:].lstrip()
        elif state == 'LISTVAL' and data[0] == ']':
            d[key] = listval
            state = 'KEY'
            data = data[1:].lstrip()
        else:
            break

assert d == {'key1': 'value',
 'key2': 'val ue',
 'key3': ['entry1', 'entry2'],
 'key4': ['o ne', '[two]'],
 'key5': 'value with a , or secial character#l',
 'key6': 'text with a protected quotation \\" inside',
 'key7': 110142,
 'key8': 'key9',
 'key10': 'not a key0=whatewver'}

答案 1 :(得分:0)

我们可以假设值(在等号之后)与 JSON 兼容,但有两个例外:

  • 单词可能不带引号
  • 列表可能不带方括号(它们被逗号分隔符识别)

所以,如果我们可以捕获等号后面的部分,我们可以:

  1. 识别带引号的字符串
  2. 用双引号将每个未加引号的单词(以字母开头)括起来
  3. 将其解析为 JSON。
  4. 如果上一步失败,用方括号括起来,再次解析为JSON

这是建议的代码:

import re 
import json

def parse(s):
    d = {}
    key = value = ""
    for m in re.findall(r'"(?:[^"\\]|\\.)*"|\w+=?|\S', s) + ["="]:
        if m[-1] == '=':  # Arrived at a new key/value pair
            if key:  # Process previous key/value pair
                try:
                    d[key] = json.loads(value)
                except Exception: # Try with brackets, if that fails: input is bad
                    d[key] = json.loads("[{}]".format(value))
            key = m[:-1]  # New key
            value = ""
        elif m[0].isalpha():  # Wrap in quotes
            value += '"{}"'.format(m)
        else:  # Punctuation, digits, ...
            value += m
    return d

对于您提供的示例数据,您将如何调用该函数:

s = r'key1=value key2="val ue" key3=[entry1, entry2] key4=["o ne", "[two]"] key5="value with a , or secial character#l" key6="text with a protected quotation \" inside" key7=1,101,42'

result = parse(s)

结果将是:

{
   'key1': 'value', 
   'key2': 'val ue', 
   'key3': ['entry1', 'entry2'], 
   'key4': ['o ne', '[two]'], 
   'key5': 'value with a , or secial character#l', 
   'key6': 'text with a protected quotation " inside', 
   'key7': [1, 101, 42]
}