Question

[{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null},{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null},{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null},{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null},{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null}]

以上内容与我从简单Salesforce的查询中获得的JSON非常相似。

下面应该将其转换为jsonl，同时还可以解决日期时间问题。

问题是我必须摆脱属性部分，因为它没有被使用。下面的代码是最新的尝试，但是导致的所有结果都是一次又一次的相同记录。（以上是重复的数据，因此如果您进行遍历，我希望它是相同的）

for element in data :

        item = data.pop()
        item.pop('attributes', None)

        tempdict = OrderedDict({})
        for k,v in item.items() :
            if 'date' in k.lower() or 'stamp' in k.lower() :
                if not v is None :
                    d = d_parse(v)
                    v = d.strftime('%Y-%m-%d %I:%M:%S')
                    tempdict[k.lower()] = v
            else :
                tempdict[k.lower()] = v

        with open(localFilePath+fileName.format(nextObj,fileCount), 'a') as outfile :
            outfile.write(json.dumps(tempdict))
            outfile.write('\n')

问题是由于某些原因，丢失了1/2条记录。我仅将767条记录中的384条放入文件中。我怀疑这个问题与pop及其代码中出现的位置有关。如何在不丢失弹出记录的1/2的情况下摆脱掉属性部分？

编辑：

以下代码引发错误（基于注释）：

for element in data :
data.pop('attributes', None)

tempdict = OrderedDict({})
for k,v in data.items() :
    if 'date' in k.lower() or 'stamp' in k.lower() :
        if not v is None :
            d = d_parse(v)
            v = d.strftime('%Y-%m-%d %I:%M:%S')
            tempdict[k.lower()] = v
    else :
        tempdict[k.lower()] = v

with open(localFilePath+fileName.format(nextObj,fileCount), 'a') as outfile :
    outfile.write(json.dumps(tempdict))
    outfile.write('\n')


Traceback (most recent call last):
  File "child_sfdc_etl.py", line 417, in <module>
    sfToS3(fileCount, sf, nextObj)
  File "child_sfdc_etl.py", line 206, in sfToS3
    send_temp_jsonl_to_s3(data, nextObj, s3, s3Destination, fileCount, s3Path)
  File "child_sfdc_etl.py", line 254, in send_temp_jsonl_to_s3
    data.pop('attributes', None)
TypeError: pop() takes at most 1 argument (2 given)

没有None的代码也会引发错误：

for element in data :
data.pop('attributes')

tempdict = OrderedDict({})
for k,v in data.items() :
    if 'date' in k.lower() or 'stamp' in k.lower() :
        if not v is None :
            d = d_parse(v)
            v = d.strftime('%Y-%m-%d %I:%M:%S')
            tempdict[k.lower()] = v
    else :
        tempdict[k.lower()] = v

with open(localFilePath+fileName.format(nextObj,fileCount), 'a') as outfile :
    outfile.write(json.dumps(tempdict))
    outfile.write('\n')

Traceback (most recent call last):
  File "child_sfdc_etl.py", line 417, in <module>
    sfToS3(fileCount, sf, nextObj)
  File "child_sfdc_etl.py", line 206, in sfToS3
    send_temp_jsonl_to_s3(data, nextObj, s3, s3Destination, fileCount, s3Path)
  File "child_sfdc_etl.py", line 254, in send_temp_jsonl_to_s3
    data.pop('attributes')
TypeError: 'str' object cannot be interpreted as an integer

Answer 1

这与在Python中如何实现迭代有关。正如其他人指出的，罪魁祸首是

for element in data :
    item = data.pop()
    <...>

Python中的序列迭代器保留当前元素的索引以确定下一个要返回的内容（通常情况下，如果序列在过程中被更改，则不可能正确地对序列进行迭代，因此这不是错误）

您将一项（从列表的开头开始）作为element。然后，您以item的形式删除列表中的the last item，并完全忽略element。下一个迭代element将是上一个element之后的项目。等等。因此，您将只能以相反的顺序处理初始列表的后半部分。

删除data.pop()并使用element。

遍历JSON列表时会丢失一半的记录

1 个答案: