Question

我有这个输入文件，我想将其转换为json。

1。]正如您所看到的那样：密钥：值以行方式而不是列方式传播。

2.每个都有一个“注释”键，其值分布在每个元素的不同行中。有些用户可能会写冗长的评论。

key,values

heading,A
Title,1
ID,12
Owner,John
Status,Active
Comments,"Im just pissed "
        ,"off from your service"
,
heading,B
Title,2
ID,21
Owner,Von
Status,Active
Comments,"Service is  "
        ,"really great"
        ,"I just enjoyed my weekend"
,
heading,C
Title,3
ID,31
Owner,Jesse
Status,Active
Comments,"Service"
        ,"needs to be"
        ,"improved"

输出

{{'heading':'A','Title':1,'ID':12,'Owner':'John','Status':'Active', "Comments":"Im just pissed off from your service"},
{....}, 
{.....}}

由于我的csv文件以行方式具有“键”：“值”，所以我真的对如何将其转换为json无能为力。

=====我尝试了什么=====

f = open( 'csv_sample.csv', 'rU' )
reader = csv.DictReader( f, fieldnames = ( "key","value" ))
for i in reader:
    print i


{'value': 'values', 'key': 'key'}
{'value': 'A', 'key': 'heading'}
{'value': '1', 'key': 'Title'}
{'value': '12', 'key': 'ID'}
{'value': 'John', 'key': 'Owner'}
{'value': 'Active', 'key': 'Status'}

正如你所看到的，那不是我想要的。请帮忙

Answer 1

编辑：也许可以尝试以下几点：

import json

def headingGen(lines):
    newHeading = {}
    for line in lines:
        try:
            k, v = line.strip('\n').split(',', 1)
            v = v.strip('"')
            if not k and not v:
                yield newHeading
                newHeading = {}
            elif not k.strip():
                newHeading[prevk] = newHeading[prevk] + v
            else:
                prevk = k
                newHeading[k] = v
        except Exception as e:
            print("I had a problem at line "+line+" : "+str(e))
    yield newHeading


def file_to_json(filename):
    with open(filename, 'r') as fh:
        next(fh)
        next(fh)
        return json.dumps(list(headingGen(fh)))

Answer 2

试试这个：

def convert_to_json(fname):
    result = []
    rec = {}
    with open(fname) as f:
        for l in f:
            if not l.strip() or l.startswith('key'):
                continue

            if l.startswith(','):
                result.append(rec)
                rec = {}
            else:
                k, v = l.strip().split(',')
                if k.strip():
                    try:
                        rec[k] = int(v)
                    except:
                        rec[k] = v.strip('"')
                else:
                    rec['Comments'] += v.strip('"')
    result.append(rec)
    return result

print convert_to_json('./csv_sample.csv')

输出：

[{'Status': 'Active', 'Title': 1, 'Comments': 'Im just pissed off from your service', 'heading': 'A', 'Owner': 'John', 'ID': 12}, {'Status': 'Active', 'Title': 2, 'Comments': 'Service is  really greatI just enjoyed my weekend', 'heading': 'B', 'Owner': 'Von', 'ID': 21}, {'Status': 'Active', 'Title': 3, 'Comments': 'Serviceneeds to beimproved', 'heading': 'C', 'Owner': 'Jesse', 'ID': 31}]

Answer 3

这个答案使用Python的列表理解来提供一种功能样式替代使用命令式样式的其他（也很好）答案。我喜欢这种风格，因为它很好地区分了问题的不同方面。

嵌套列表推导通过首先将输入拆分为部分并通过使用正则表达式将其拆分为项目并将函数split_item（）应用于每个项目以最终获取键/值来构造每个部分中的字典来构造结果对

按部分读取源数据以提高内存效率。

import re
import json

# Define a regular expression splitting a section into items.
# Each newline which is not followed by whitespace splits.
splitter = re.compile(r'\n(?!\s)')

def section_generator(f):
    # Generator reading a single section from the input file in each iteration.
    # The sections are separated by a comma on a separate line.
    section = ''
    for line in f:
        if line == ',\n':
            yield section
            section = ''
        else:
            section += line
    yield section

def split_item(item):
    # Convert the the item including "key,value" into a key/value pair.
    key, value = item.split(',', 1)
    if value.startswith('"'):
        # Convert multiline quoted string to unquoted single line.
        value = ''.join(line.strip().lstrip(',').strip('"')
                        for line in value.strip().splitlines())
    elif value.isdigit():
        # Convert numeric value to int.
        value = int(value)
    return key, value

with open('csv_sample.csv', 'rU') as f:
    # Ignore the "header" (skip everything until the empty line is found).
    for line in f:
        if line == '\n':
            break

    # Construct the resulting list of dictionaries using list comprehensions.
    result = [dict(split_item(item) for item in splitter.split(section) if item)
              for section in section_generator(f)]

print json.dumps(result)

Answer 4

这不是简单的转换，因此我们需要完全指定：

输入文件是一个csv文件，其中有两列名为key和values
记录由不同的行组成，定义键和映射的值
键heading表示记录的开头
空白键是延续行 - 其值应添加到上一个值
如果延续行的值不以分隔符开头，并且前一个值不以分隔符结尾，则插入空格（分隔符为空格，制表符，点，逗号和-）< / LI>
heading字段不能有连续行 - 这允许更简单的解码

代码可以是：

with open('csv_sample.csv') as fd
    rd = csv.DictReader(fd)
    rec = None
    lastkey = None
    sep = ' \t,.-'
    for row in rd:
        # print row
        key = row['key'].strip()
        if key == 'heading':
            if rec is not None:
                # process previous record
                print json.dumps(rec)
            rec = { key: row['values'] }
        elif key == '': # continuation line
            if (rec[lastkey][-1] in sep) or (row['values'] in sep):
                rec[lastkey] += row['values']
            else:
                rec[lastkey] += ' ' + row['values']
        else:
            # normal field: add it to rec and store key
            rec[key] = row['values']
            lastkey = key
    # process last record
    if rec is not None:
        print json.dumps(rec)

您可以通过print json.dumps(rec)

更改yield json.dumps(rec)，轻松将其转换为生成器

用你的例子，它给出了：

{"Status": "Active", "Title": "1", "Comments": "Im just pissed off from your service", "heading": "A", "Owner": "John", "ID": "12"}
{"Status": "Active", "Title": "2", "Comments": "Service is  really greatI just enjoyed my weekend", "heading": "B", "Owner": "Von", "ID": "21"}
{"Status": "Active", "Title": "3", "Comments": "Serviceneeds to beimproved", "heading": "C", "Owner": "Jesse", "ID": "31"}

由于此代码使用csv模块， by construction 对注释中的逗号免疫。

在python

4 个答案: