如何对Pandas Dataframe的YAML规范化?

时间:2019-01-18 17:55:29

标签: python pandas dataframe yaml denormalization

我正在尝试将数据从YAML文件获取到Pandas DataFrame中。以下面的示例data.yml

---
 - doc: "Book1"
   reviews:
     - reviewer: "Paul"
       stars: "5"
     - reviewer: "Sam"
       stars: "2"
 - doc: "Book2"
   reviews:
     - reviewer: "John"
       stars: "4"
     - reviewer: "Sam"
       stars: "3"
     - reviewer: "Pete"
       stars: "2"
...

所需的DataFrame看起来像这样:

     doc reviews.reviewer reviews.stars
0  Book1             Paul             5
1  Book1              Sam             2
2  Book2             John             4
3  Book2              Sam             3
4  Book2             Pete             2

我尝试过将YAML数据以不同的方式(例如with open('data.yml') as f: data = pd.DataFrame(yaml.load(f)))馈送到Pandas,但是单元格始终包含嵌套的字典。这个solution works for general JSON data,但其中有很多代码,而且似乎存在针对YAML的更简单解决方案。

是否存在内置或Pythonic的方法来对YAML进行非规范化以转换为Pandas Dataframe?

2 个答案:

答案 0 :(得分:2)

现在使用上面会导致 FutureWarning:pandas.io.json.json_normalize 已弃用,请改用 pandas.json_normalize

# lets say the yaml file is test_sample.yml
from pandas import json_normalize
from os import getcwd, path
from yaml import SafeLoader, load

path_to_yaml = path.join(getcwd(), ..., "test_sample.yaml")
with open(path_to_yaml) as yaml_file:
    yaml_contents = load(path_to_file, Loader=SafeLoader)
yaml_df = json_normalize(yaml_contents)

答案 1 :(得分:1)

YAML加载后,您应该使用json_normalize来使词典变平:

pd.io.json.json_normalize(yaml.load(f), 'reviews', 'doc')

  reviewer stars    doc
0     Paul     5  Book1
1      Sam     2  Book1
2     John     4  Book2
3      Sam     3  Book2
4     Pete     2  Book2