遍历JSON列表时会丢失一半的记录

时间:2019-04-12 00:38:23

标签: python json iterator

[{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null},{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null},{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null},{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null},{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null}]

以上内容与我从简单Salesforce的查询中获得的JSON非常相似。

下面应该将其转换为jsonl,同时还可以解决日期时间问题。

问题是我必须摆脱属性部分,因为它没有被使用。下面的代码是最新的尝试,但是导致的所有结果都是一次又一次的相同记录。 (以上是重复的数据,因此如果您进行遍历,我希望它是相同的)

for element in data :

        item = data.pop()
        item.pop('attributes', None)

        tempdict = OrderedDict({})
        for k,v in item.items() :
            if 'date' in k.lower() or 'stamp' in k.lower() :
                if not v is None :
                    d = d_parse(v)
                    v = d.strftime('%Y-%m-%d %I:%M:%S')
                    tempdict[k.lower()] = v
            else :
                tempdict[k.lower()] = v

        with open(localFilePath+fileName.format(nextObj,fileCount), 'a') as outfile :
            outfile.write(json.dumps(tempdict))
            outfile.write('\n')

问题是由于某些原因,丢失了1/2条记录。我仅将767条记录中的384条放入文件中。我怀疑这个问题与pop及其代码中出现的位置有关。如何在不丢失弹出记录的1/2的情况下摆脱掉属性部分?

编辑:

以下代码引发错误(基于注释):

for element in data :
data.pop('attributes', None)

tempdict = OrderedDict({})
for k,v in data.items() :
    if 'date' in k.lower() or 'stamp' in k.lower() :
        if not v is None :
            d = d_parse(v)
            v = d.strftime('%Y-%m-%d %I:%M:%S')
            tempdict[k.lower()] = v
    else :
        tempdict[k.lower()] = v

with open(localFilePath+fileName.format(nextObj,fileCount), 'a') as outfile :
    outfile.write(json.dumps(tempdict))
    outfile.write('\n')


Traceback (most recent call last):
  File "child_sfdc_etl.py", line 417, in <module>
    sfToS3(fileCount, sf, nextObj)
  File "child_sfdc_etl.py", line 206, in sfToS3
    send_temp_jsonl_to_s3(data, nextObj, s3, s3Destination, fileCount, s3Path)
  File "child_sfdc_etl.py", line 254, in send_temp_jsonl_to_s3
    data.pop('attributes', None)
TypeError: pop() takes at most 1 argument (2 given)

没有None的代码也会引发错误:

for element in data :
data.pop('attributes')

tempdict = OrderedDict({})
for k,v in data.items() :
    if 'date' in k.lower() or 'stamp' in k.lower() :
        if not v is None :
            d = d_parse(v)
            v = d.strftime('%Y-%m-%d %I:%M:%S')
            tempdict[k.lower()] = v
    else :
        tempdict[k.lower()] = v

with open(localFilePath+fileName.format(nextObj,fileCount), 'a') as outfile :
    outfile.write(json.dumps(tempdict))
    outfile.write('\n')

Traceback (most recent call last):
  File "child_sfdc_etl.py", line 417, in <module>
    sfToS3(fileCount, sf, nextObj)
  File "child_sfdc_etl.py", line 206, in sfToS3
    send_temp_jsonl_to_s3(data, nextObj, s3, s3Destination, fileCount, s3Path)
  File "child_sfdc_etl.py", line 254, in send_temp_jsonl_to_s3
    data.pop('attributes')
TypeError: 'str' object cannot be interpreted as an integer

1 个答案:

答案 0 :(得分:0)

这与在Python中如何实现迭代有关。正如其他人指出的,罪魁祸首是

for element in data :
    item = data.pop()
    <...>

Python中的序列迭代器保留当前元素的索引以确定下一个要返回的内容(通常情况下,如果序列在过程中被更改,则不可能正确地对序列进行迭代,因此这不是错误)

您将一项(从列表的开头开始)作为element。然后,您以item的形式删除列表中的the last item,并完全忽略element。下一个迭代element将是上一个element之后的项目。等等。因此,您将只能以相反的顺序处理初始列表的后半部分。


删除data.pop()并使用element