[{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null},{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null},{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null},{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null},{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null}]
以上内容与我从简单Salesforce的查询中获得的JSON非常相似。
下面应该将其转换为jsonl,同时还可以解决日期时间问题。
问题是我必须摆脱属性部分,因为它没有被使用。下面的代码是最新的尝试,但是导致的所有结果都是一次又一次的相同记录。 (以上是重复的数据,因此如果您进行遍历,我希望它是相同的)
for element in data :
item = data.pop()
item.pop('attributes', None)
tempdict = OrderedDict({})
for k,v in item.items() :
if 'date' in k.lower() or 'stamp' in k.lower() :
if not v is None :
d = d_parse(v)
v = d.strftime('%Y-%m-%d %I:%M:%S')
tempdict[k.lower()] = v
else :
tempdict[k.lower()] = v
with open(localFilePath+fileName.format(nextObj,fileCount), 'a') as outfile :
outfile.write(json.dumps(tempdict))
outfile.write('\n')
问题是由于某些原因,丢失了1/2条记录。我仅将767条记录中的384条放入文件中。我怀疑这个问题与pop及其代码中出现的位置有关。如何在不丢失弹出记录的1/2的情况下摆脱掉属性部分?
编辑:
以下代码引发错误(基于注释):
for element in data :
data.pop('attributes', None)
tempdict = OrderedDict({})
for k,v in data.items() :
if 'date' in k.lower() or 'stamp' in k.lower() :
if not v is None :
d = d_parse(v)
v = d.strftime('%Y-%m-%d %I:%M:%S')
tempdict[k.lower()] = v
else :
tempdict[k.lower()] = v
with open(localFilePath+fileName.format(nextObj,fileCount), 'a') as outfile :
outfile.write(json.dumps(tempdict))
outfile.write('\n')
Traceback (most recent call last):
File "child_sfdc_etl.py", line 417, in <module>
sfToS3(fileCount, sf, nextObj)
File "child_sfdc_etl.py", line 206, in sfToS3
send_temp_jsonl_to_s3(data, nextObj, s3, s3Destination, fileCount, s3Path)
File "child_sfdc_etl.py", line 254, in send_temp_jsonl_to_s3
data.pop('attributes', None)
TypeError: pop() takes at most 1 argument (2 given)
没有None的代码也会引发错误:
for element in data :
data.pop('attributes')
tempdict = OrderedDict({})
for k,v in data.items() :
if 'date' in k.lower() or 'stamp' in k.lower() :
if not v is None :
d = d_parse(v)
v = d.strftime('%Y-%m-%d %I:%M:%S')
tempdict[k.lower()] = v
else :
tempdict[k.lower()] = v
with open(localFilePath+fileName.format(nextObj,fileCount), 'a') as outfile :
outfile.write(json.dumps(tempdict))
outfile.write('\n')
Traceback (most recent call last):
File "child_sfdc_etl.py", line 417, in <module>
sfToS3(fileCount, sf, nextObj)
File "child_sfdc_etl.py", line 206, in sfToS3
send_temp_jsonl_to_s3(data, nextObj, s3, s3Destination, fileCount, s3Path)
File "child_sfdc_etl.py", line 254, in send_temp_jsonl_to_s3
data.pop('attributes')
TypeError: 'str' object cannot be interpreted as an integer
答案 0 :(得分:0)
这与在Python中如何实现迭代有关。正如其他人指出的,罪魁祸首是
for element in data :
item = data.pop()
<...>
Python中的序列迭代器保留当前元素的索引以确定下一个要返回的内容(通常情况下,如果序列在过程中被更改,则不可能正确地对序列进行迭代,因此这不是错误)
您将一项(从列表的开头开始)作为element
。然后,您以item
的形式删除列表中的the last item,并完全忽略element
。下一个迭代element
将是上一个element
之后的项目。等等。因此,您将只能以相反的顺序处理初始列表的后半部分。
删除data.pop()
并使用element
。