从Tone Analyser的JSON响应中的字典列表中提取数据

时间:2018-11-14 23:23:29

标签: python python-3.x ibm-watson tone-analyzer

我正在使用IBM Watson的Tone分析器分析文本,并且尝试提取与句子音调有关的所有信息(例如sentence_idtexttones,{{1} },tone_idtone_name)并将其添加到数据帧(带有列; scoresentence_idtexttones,{{ 1}}和tone_id)。这是我的输出示例:

score

这是我为获得此输出而编写的代码:

tone_name

1 个答案:

答案 0 :(得分:0)

首先,由于您的JSON格式不正确,我使用的是here

中来自Tone分析器API参考的JSON。

使用API​​参考中的JSON和Pandas json_normalize,这是我想出的代码

from pandas.io.json import json_normalize

jsonfile = {
  "document_tone": {
    "tones": [
      {
        "score": 0.6165,
        "tone_id": "sadness",
        "tone_name": "Sadness"
      },
      {
        "score": 0.829888,
        "tone_id": "analytical",
        "tone_name": "Analytical"
      }
    ]
  },
  "sentences_tone": [
    {
      "sentence_id": 0,
      "text": "Team, I know that times are tough!",
      "tones": [
        {
          "score": 0.801827,
          "tone_id": "analytical",
          "tone_name": "Analytical"
        }
      ]
    },
    {
      "sentence_id": 1,
      "text": "Product sales have been disappointing for the past three quarters.",
      "tones": [
        {
          "score": 0.771241,
          "tone_id": "sadness",
          "tone_name": "Sadness"
        },
        {
          "score": 0.687768,
          "tone_id": "analytical",
          "tone_name": "Analytical"
        }
      ]
    },
    {
      "sentence_id": 2,
      "text": "We have a competitive product, but we need to do a better job of selling it!",
      "tones": [
        {
          "score": 0.506763,
          "tone_id": "analytical",
          "tone_name": "Analytical"
        }
      ]
    }
  ]
}

mydata = json_normalize(jsonfile['sentences_tone'])
mydata.head(3)
print(mydata)

tones_data = json_normalize(data=jsonfile['sentences_tone'], record_path='tones')
tones_data.head(3)
print(tones_data)

输出数据帧将为

   sentence_id                        ...                                            tones
0            0                        ...[{'score': 0.801827, 'tone_id': 'analytical', ...
1            1                        ...[{'score': 0.771241, 'tone_id': 'sadness', 'to...
2            2                        ...[{'score': 0.506763, 'tone_id': 'analytical', ...

[3 rows x 3 columns]
      score     tone_id   tone_name
0  0.801827  analytical  Analytical
1  0.771241     sadness     Sadness
2  0.687768  analytical  Analytical
3  0.506763  analytical  Analytical

此外,我还为您创建了REPL来更改输入并在浏览器中运行代码-https://repl.it/@aficionado/DarkturquoiseUnnaturalDistributeddatabase

请参阅此Kaggle链接以了解有关flattening JSON in Python using Pandas

的更多信息