我正在尝试将一个JSON文件导入到Python中进行一些数据分析。每个JSON对象都有很多不同的变量(大约7-10)。某些对象具有某些变量,而其他对象则没有。我特别感兴趣的是每个json线上的五个变量。但是,某些对象缺少数据。如何使程序为每个丢失的数据设置None?
import json
data = []
keys = ["hostid","time", "userid","link", "title"]
m = len(keys)
with open('test.json') as json_data:
for line in json_data:
dataline = json.loads(line)
row = []
for i in xrange(m):
row.append(dataline[keys[i]])
data.append(row)
json_data.close()
data = np.array(data)
以下是一些示例JSON对象。如您所见,第一个对象包含我想要的所有五个变量,但第二个对象没有“title”变量的数据。
{
"title": "Monster Man",
"link": "http://monsters4ever.com/tagged/rosemary%27s%20baby%20(1968)",
"userid": 130290,
"field5": "lezmer Brunch at City Winery? Who Knew? -- Grub Street Chicago\"",
"hostid": "3969937ab0a3e2db8690c482564006a7",
"time": 376541
}
{
"link": "http://www.sfgate.com/world/article/WORLD-News-of-the-Day-From-Across-the-Globe-4120318.php",
"userid": 227954, "field5": "ries « SHEfinds\"",
"hostid": "6096407936827c96fa0833f26ab33b76",
"time": 376541
}
有人可以帮帮我吗?
答案 0 :(得分:6)
当您尝试从对象中检索数据而不是通常的x['field']
时,请尝试x.get('field')
,而不是填写缺少的数据。
e.g:
with open('test.json') as json_data:
for line in json_data:
dataline = json.loads(line)
row = []
for key in keys:
row.append(dataline.get(key))
# better is:
# row = [dataline.get(key) for key in keys]
data.append(row)
这是有效的,因为如果在字典中找不到密钥,dict.get
会返回None
。
如果你真的不想这样做,并且你知道你想要的字段,你可以使用dict.setdefault
将None
放在那里:
for field in fields_you_care_about:
obj.setdefault(field, None)
答案 1 :(得分:1)
您可以使用try
,因为当您尝试调用不存在的密钥时它会抛出异常:
import json
data = []
keys = ["hostid","time", "userid","link", "title"]
m = len(keys)
with open('test.json') as json_data:
for line in json_data:
dataline = json.loads(line)
row = []
for i in xrange(m):
try:
row.append(dataline[keys[i]])
except Exception:
row.append(None)
data.append(row)
json_data.close()
data = np.array(data)
答案 2 :(得分:1)
我会试试。我也只是遍历键列表。
with open('test.json') as json_data:
for line in json_data:
dataline = json.loads(line)
row = []
for i in keys:#iterate through keys
try:
row.append(dataline[i])
except:
pass
data.append(row)
json_data.close()