Question

我有一个json文件，已将其转换为字典。在json文件中，有所谓的“年”和“类别”的“简单标题”，它们在json文件中是独立的。我要创建的新列将称为“ awarded_or_not”，并且将从json文件中标题为“ Laureates”的字典中的条目中检索数据。

到目前为止，我已经有了它来检索和打印两个“简单标题” ...

import json
import pandas as pd

def report(nobelprizeDict):
  # convert dictionary to DataFrame
  df = pd.DataFrame.from_dict(nobelprizeDict)
  # select columns 'year' and 'category'
  res = df[['year', 'category']]
  # return result
  return res

with open('nobelprizes.json', 'rt') as f:
  nobel = json.load(f)

df_years_categories = report(nobel)

print(df_years_categories)

例如，如果我要写res = df[['year', 'category', 'laureates']]，“获奖者”组件会将获奖者字典中的整个条目列表打印到该列中

我希望这是有道理的，并且有人可以纠正它，以便我可以看到我做错了

Answer 1

这里是一个示例，我使用numpy来确定获奖者是否具有值，如果获奖者为True或False，则稍后添加带有值的列...请注意，您添加了nobelprizeDict ['prizes']（在我的情况下）：

import json
import pandas as pd
import numpy as np

def report(nobelprizeDict):
  # convert dictionary to DataFrame{}
  df = pd.DataFrame.from_dict(nobelprizeDict['prizes'])
  # select columns 'year' and 'category'
  res = df[['year', 'category', 'laureates']]
  return res

with open('nobelprizes.json', 'rt') as f:
  nobel = json.load(f)

df_years_categories = report(nobel)
df_years_categories['laureates'] = np.max(df_years_categories.isna(), 1).astype(bool)
df_years_categories['awarded_or_not'] = np.where(df_years_categories['laureates']==True, 'NO', 'YES')

print(df_years_categories)

现有熊猫词典中的新列

1 个答案: