我有混合级别的JSON要在python中解析,按键有问题

时间:2019-03-26 03:24:08

标签: python json

我有一组嵌套的JSON,到目前为止,我正在做以下事情:

r = session.get(search_url, auth=HTTPKerberosAuth(mutual_authentication=OPTIONAL), verify=False)
json_data = json.loads(r.content)
flattened_data = json_normalize(json_data['documents'])
print(list(flattened_data))

这将输出以下结果:

['affected_users', 'aggregatedLabels', 'aliases', 'assignedFolder', 'assigneeIdentity', 'attachments', 'authorizations', 'autoUpgrade.workingHours', 'conversation', 'createDate', 'dedupes', 'deleted', 'description', 'descriptionContentType', 'editCount', 'engagementList', 'extensions.backlog.priority', 'extensions.effort.effortEstimatedLocal.effort', 'extensions.effort.effortEstimatedLocal.unit', 'extensions.effort.effortEstimatedRecursiveSum.effort', 'extensions.effort.effortEstimatedRecursiveSum.unit', 'extensions.effort.effortRemainingLocalSum.effort', 'extensions.effort.effortRemainingLocalSum.unit', 'extensions.effort.effortRemainingRecursiveSum.effort', 'extensions.effort.effortRemainingRecursiveSum.unit', 'extensions.effort.effortSpentLocalSum.effort', 'extensions.effort.effortSpentLocalSum.unit', 'extensions.effort.effortSpentRecursiveSum.effort', 'extensions.effort.effortSpentRecursiveSum.unit', 'extensions.tt.assignedGroup', 'extensions.tt.building', 'extensions.tt.caseType', 'extensions.tt.category', 'extensions.tt.city', 'extensions.tt.endCode', 'extensions.tt.ecd', 'extensions.tt.impact', 'extensions.tt.item', 'extensions.tt.justification', 'extensions.tt.migrationStatus', 'extensions.tt.minImpact', 'extensions.tt.resolution', 'extensions.tt.rootCause', 'extensions.tt.rootCauseDetails', 'extensions.tt.status', 'extensions.tt.type', 'frames', 'id', 'identityTimestamped', 'inheritedLabels', 'isTicket', 'labels', 'lastAssignedDate', 'lastResolvedByIdentity', 'lastResolvedDate', 'lastUpdatedActualDate', 'lastUpdatedConversationDate', 'lastUpdatedDate', 'lastUpdatedIdentity', 'next_step.action', 'next_step.exceptions', 'next_step.owner', 'parentTasks', 'requesterIdentity', 'rootCauses', 'rulesReceipt', 'schedule.estimatedCompletionDate', 'schedule.estimatedStartDate', 'schedule.needByDate', 'schema', 'slaReceipts', 'status', 'stickyThreadId', 'submitterIdentity', 'subtasks', 'tags', 'threads', 'title', 'watchers']

从此列表中,我试图仅将某些键及其值放入数据框中:

    print(flattened_data['assigneeIdentity',
#                         'createDate',
#                         'description',
#                         'extensions.tt.assignedGroup',
#                         'extensions.tt.category',
#                         'extensions.tt.endCode',
#                         'extensions.tt.ecd',
#                         'extensions.tt.impact',
#                         'extensions.tt.item',
#                         'extensions.tt.justification',
#                         'extensions.tt.resolution',
#                         'extensions.tt.rootCause',
#                         'extensions.tt.rootCauseDetails',
#                         'extensions.tt.status',
#                         'extensions.tt.type',
#                         'id',
#                         'labels',
#                         'lastAssignedDate',
#                         'lastResolvedByIdentity',
#                         'lastResolvedDate',
#                         'lastUpdatedActualDate',
#                         'lastUpdatedConversationDate',
#                         'lastUpdatedDate',
#                         'lastUpdatedIdentity',
#                         'requesterIdentity',
#                         'submitterIdentity',
#                         'title',
#                         'watchers'])

执行此操作时,出现关键错误。因此,对于上面我列出的字段,出现的基本JSON如下所示,并对每个字段的嵌套级别有所了解;每个“ item”都是documents元素下的整数,然后我需要更多嵌套元素:

documents:
          0:
             extensions:
                         tt:
                             category:
                             type:
                             item:
                             assignedGroup:
                             impact:
                             justification:
                             endCode:
                             rootCause:
                             rootCauseDetails:
                             status:
              id:
              title:
              lastAssignedDate:
              createDate:
              lastUpdatedActualDate:
              lastResolvedDate:
              lastResolvedByIdentity:
              lastUpdatedIdentity:
              assigneeIdentity:
              submitterIdentity:
              requesterIdentity:
              identityTimestamped:
              lastUpdatedConversationDate:
              lastUpdatedDate:
          1:
             extensions:
                         tt:
                             category:
                             type:
                             item:
                             assignedGroup:
                             impact:
                             justification:
                             endCode:
                             rootCause:
                             rootCauseDetails:
                             status:
              id:
              title:
              lastAssignedDate:
              createDate:
              lastUpdatedActualDate:
              lastResolvedDate:
              lastResolvedByIdentity:
              lastUpdatedIdentity:
              assigneeIdentity:
              submitterIdentity:
              requesterIdentity:
              identityTimestamped:
              lastUpdatedConversationDate:
              lastUpdatedDate:

如何将其和值放入数据框。

2 个答案:

答案 0 :(得分:0)

引用我今天刚刚评论过的fantastic response中的内容。也许这会有所帮助:

const probabilities = [50, 10, 10, 10, 10, 2, 2, 2, 2, 2];
let hits = probabilities.map(x => 0);
const numAttempts = 10000;
for (let k = 0; k < numAttempts; k++) {
    hits[randomIndex(probabilities)]++;
}
for (let i = 0; i < probabilities.length; i++) {
    console.log("" + i + ": prob=" + probabilities[i] + 
      ", freq=" + (100 * hits[i] / numAttempts).toFixed(1));
}

/*
Example of console.log output:
0: prob=50, frequency: 49.4
1: prob=10, freq=10.3
2: prob=10, freq=10.4
3: prob=10, freq=9.9
4: prob=10, freq=10.3
5: prob=2, freq=2.2
6: prob=2, freq=1.8
7: prob=2, freq=1.8
8: prob=2, freq=1.9
9: prob=2, freq=2.1
*/

答案 1 :(得分:0)

flattened_data应该已经是有效的DataFrame。错误似乎是您正在尝试打印flattened_data["key1", "key2", ...],它将在["key1", "key2", ...]中查找名为 的列{em> flattened_data。本质上,您是在告诉DataFrame “获取名称为是此列表的列”

要从DataFrame中获取列列表,您应该尝试flattened_data[["key1", "key2", ...]],而不是说“获取名称为的所有列在该列表中”

这里还可能发生的事情是,您有一个带有["0.id", "0.title", ..., "1.id", "1.title", ...]列且只有一行的DataFrame:将值分配给JSON对象中的每个路径。

但是,pandas.io.json.normalize_json()可以将字典列表作为参数,因此,可以使用flattened_data = json_normalize(json_data['documents'])中的子字典列表(例如,{{ 1}})应该返回正确的DataFrame。

json_data['documents']

然后,您可以使用以下方法检索所需的列:

json_data['documents'].values()