我有一个txt文件,其中包含按空格分割的数据,例如:
2017-05-16 00:44:36.151724381 +43.8187 -104.7669 -004.4 00.6 00.2 00.2 090 C
2017-05-16 00:44:36.246672534 +41.6321 -104.7834 +004.3 00.6 00.3 00.2 130 C
2017-05-16 00:44:36.356132768 +46.4559 -104.5989 -004.2 01.1 00.4 00.2 034 C
我希望将其转换为JSON数据,如:
{"dataset": "Lightning","observation_date": "20170516004436151", "location": { "type": "point", "coordinates": [43.8187, -104.7669]}} {"dataset": "Lightning","observation_date": "20170516004436246", "location": { "type": "point", "coordinates": [41.6321, -104.7834]}} {"dataset": "Lightning","observation_date": "20170516004436356", "location": { "type": "point", "coordinates": [46.4559, -104.5989]}}
我必须附加'数据集':'闪电'键/值对,组合并去除日期和时间,并在执行任何json转换之前将lat / lng组合成dict。
但是现在我仍然得到日期和时间元素,而不会被剥夺“ - ”和“:”字符,如:
{"observation_date": "2017-05-1600:44:36.151724381", "location": {"type": "point", "coordinates": ["+43.8187", "-104.7669"]}, "dataset": "Lightning"} {"observation_date": "2017-05-1600:44:36.246672534", "location": {"type": "point", "coordinates": ["+41.6321", "-104.7834"]}, "dataset": "Lightning"} {"observation_date": "2017-05-1600:44:36.356132768", "location": {"type": "point", "coordinates": ["+46.4559", "-104.5989"]}, "dataset": "Lightning"}
到目前为止我编码的内容:
import json
import sys
def convert(filename):
dataDict = {}
txtFile = filename[0]
print "Opening TXT file: ",txtFile
infile = open(txtFile, "r")
for line in infile:
lineStrip = line.strip()
parts = [p.strip() for p in lineStrip.split()]
date = parts[0].strip("-") #trying to get rid of "-" but not working
time = parts[1].strip(":") #trying to get rid of ":" and "." but not working
dataDict.update({"dataset":"Lightning"})
dataDict.update({"observation_date": date + time})
dataDict.update({"location": {"type":"point", "coordinates": [parts[2], parts[3]]}})
json_filename = txtFile.split(".")[0]+".json"
jsonf = open(json_filename,'a')
data = json.dumps(dataDict)
jsonf.write(data + "\n")
print dataDict
infile.close()
jsonf.close()
if __name__=="__main__":
convert(sys.argv[1:])
但我不知道如何剥离“ - ”,“。”和“:”以及将“数据集”:“闪电”元素放在前面。
答案 0 :(得分:1)
这应该有效
date = parts[0].replace("-",'') #trying to get rid of "-" but not working
time = parts[1].replace(":",'') #trying to get rid of ":" and "." but not working
答案 1 :(得分:1)
你应该这样做:
date = parts [0] .replace(' - ','') time = parts [1] .replace(':''')
要在JSON中预先获得data = json.dumps(dataDict, sort_keys=True)
,您唯一的选择是对键进行排序:
dataDict["dataset"] = "Lightning"
您还应该考虑做
.update
而不是XDocument.Descendants
。
答案 2 :(得分:1)
Python词典是无序的,因此您无法将"dataset":"lightning"
元素指定为第一个。为此,我会使用OrderedDict代替或者像其他人提到的那样对json进行排序。
为了正确格式化时间,我会使用datetime
对象:
import datetime
date_string = parts[0] + parts[1]
format = "%Y-%d-%m%H:%M:%S.%f"
dt = datetime.strptime(date_string, format)
new_date_string = dt.strftime("%Y%d%m%H%M%S")
使用datetime对象很有帮助,因为如果你继续处理数据而不是吐出json,它会很好地与pandas和numpy一起使用。如果需要,它还支持数学运算和时区本地化。