我有这个输入文件,我想将其转换为json。
1。]正如您所看到的那样:密钥:值以行方式而不是列方式传播。
2.每个都有一个“注释”键,其值分布在每个元素的不同行中。有些用户可能会写冗长的评论。
key,values
heading,A
Title,1
ID,12
Owner,John
Status,Active
Comments,"Im just pissed "
,"off from your service"
,
heading,B
Title,2
ID,21
Owner,Von
Status,Active
Comments,"Service is "
,"really great"
,"I just enjoyed my weekend"
,
heading,C
Title,3
ID,31
Owner,Jesse
Status,Active
Comments,"Service"
,"needs to be"
,"improved"
输出
{{'heading':'A','Title':1,'ID':12,'Owner':'John','Status':'Active', "Comments":"Im just pissed off from your service"},
{....},
{.....}}
由于我的csv文件以行方式具有“键”:“值”,所以我真的对如何将其转换为json无能为力。
=====我尝试了什么=====
f = open( 'csv_sample.csv', 'rU' )
reader = csv.DictReader( f, fieldnames = ( "key","value" ))
for i in reader:
print i
{'value': 'values', 'key': 'key'}
{'value': 'A', 'key': 'heading'}
{'value': '1', 'key': 'Title'}
{'value': '12', 'key': 'ID'}
{'value': 'John', 'key': 'Owner'}
{'value': 'Active', 'key': 'Status'}
正如你所看到的,那不是我想要的。请帮忙
答案 0 :(得分:1)
编辑:也许可以尝试以下几点:
import json
def headingGen(lines):
newHeading = {}
for line in lines:
try:
k, v = line.strip('\n').split(',', 1)
v = v.strip('"')
if not k and not v:
yield newHeading
newHeading = {}
elif not k.strip():
newHeading[prevk] = newHeading[prevk] + v
else:
prevk = k
newHeading[k] = v
except Exception as e:
print("I had a problem at line "+line+" : "+str(e))
yield newHeading
def file_to_json(filename):
with open(filename, 'r') as fh:
next(fh)
next(fh)
return json.dumps(list(headingGen(fh)))
答案 1 :(得分:1)
试试这个:
def convert_to_json(fname):
result = []
rec = {}
with open(fname) as f:
for l in f:
if not l.strip() or l.startswith('key'):
continue
if l.startswith(','):
result.append(rec)
rec = {}
else:
k, v = l.strip().split(',')
if k.strip():
try:
rec[k] = int(v)
except:
rec[k] = v.strip('"')
else:
rec['Comments'] += v.strip('"')
result.append(rec)
return result
print convert_to_json('./csv_sample.csv')
输出:
[{'Status': 'Active', 'Title': 1, 'Comments': 'Im just pissed off from your service', 'heading': 'A', 'Owner': 'John', 'ID': 12}, {'Status': 'Active', 'Title': 2, 'Comments': 'Service is really greatI just enjoyed my weekend', 'heading': 'B', 'Owner': 'Von', 'ID': 21}, {'Status': 'Active', 'Title': 3, 'Comments': 'Serviceneeds to beimproved', 'heading': 'C', 'Owner': 'Jesse', 'ID': 31}]
答案 2 :(得分:0)
这个答案使用Python的列表理解来提供一种功能样式替代使用命令式样式的其他(也很好)答案。我喜欢这种风格,因为它很好地区分了问题的不同方面。
嵌套列表推导通过首先将输入拆分为部分并通过使用正则表达式将其拆分为项目并将函数split_item()应用于每个项目以最终获取键/值来构造每个部分中的字典来构造结果对
按部分读取源数据以提高内存效率。
import re
import json
# Define a regular expression splitting a section into items.
# Each newline which is not followed by whitespace splits.
splitter = re.compile(r'\n(?!\s)')
def section_generator(f):
# Generator reading a single section from the input file in each iteration.
# The sections are separated by a comma on a separate line.
section = ''
for line in f:
if line == ',\n':
yield section
section = ''
else:
section += line
yield section
def split_item(item):
# Convert the the item including "key,value" into a key/value pair.
key, value = item.split(',', 1)
if value.startswith('"'):
# Convert multiline quoted string to unquoted single line.
value = ''.join(line.strip().lstrip(',').strip('"')
for line in value.strip().splitlines())
elif value.isdigit():
# Convert numeric value to int.
value = int(value)
return key, value
with open('csv_sample.csv', 'rU') as f:
# Ignore the "header" (skip everything until the empty line is found).
for line in f:
if line == '\n':
break
# Construct the resulting list of dictionaries using list comprehensions.
result = [dict(split_item(item) for item in splitter.split(section) if item)
for section in section_generator(f)]
print json.dumps(result)
答案 3 :(得分:0)
这不是简单的转换,因此我们需要完全指定:
key
和values
heading
表示记录的开头-
)< / LI>
heading
字段不能有连续行 - 这允许更简单的解码代码可以是:
with open('csv_sample.csv') as fd
rd = csv.DictReader(fd)
rec = None
lastkey = None
sep = ' \t,.-'
for row in rd:
# print row
key = row['key'].strip()
if key == 'heading':
if rec is not None:
# process previous record
print json.dumps(rec)
rec = { key: row['values'] }
elif key == '': # continuation line
if (rec[lastkey][-1] in sep) or (row['values'] in sep):
rec[lastkey] += row['values']
else:
rec[lastkey] += ' ' + row['values']
else:
# normal field: add it to rec and store key
rec[key] = row['values']
lastkey = key
# process last record
if rec is not None:
print json.dumps(rec)
您可以通过print json.dumps(rec)
yield json.dumps(rec)
,轻松将其转换为生成器
用你的例子,它给出了:
{"Status": "Active", "Title": "1", "Comments": "Im just pissed off from your service", "heading": "A", "Owner": "John", "ID": "12"}
{"Status": "Active", "Title": "2", "Comments": "Service is really greatI just enjoyed my weekend", "heading": "B", "Owner": "Von", "ID": "21"}
{"Status": "Active", "Title": "3", "Comments": "Serviceneeds to beimproved", "heading": "C", "Owner": "Jesse", "ID": "31"}
由于此代码使用csv模块, by construction 对注释中的逗号免疫。