我正在使用IBM Watson的Tone分析器分析文本,并且尝试提取与句子音调有关的所有信息(例如sentence_id
,text
,tones
,{{1} },tone_id
,tone_name
)并将其添加到数据帧(带有列; score
,sentence_id
,text
,tones
,{{ 1}}和tone_id
)。这是我的输出示例:
score
这是我为获得此输出而编写的代码:
tone_name
答案 0 :(得分:0)
首先,由于您的JSON格式不正确,我使用的是here
中来自Tone分析器API参考的JSON。使用API参考中的JSON和Pandas json_normalize,这是我想出的代码
from pandas.io.json import json_normalize
jsonfile = {
"document_tone": {
"tones": [
{
"score": 0.6165,
"tone_id": "sadness",
"tone_name": "Sadness"
},
{
"score": 0.829888,
"tone_id": "analytical",
"tone_name": "Analytical"
}
]
},
"sentences_tone": [
{
"sentence_id": 0,
"text": "Team, I know that times are tough!",
"tones": [
{
"score": 0.801827,
"tone_id": "analytical",
"tone_name": "Analytical"
}
]
},
{
"sentence_id": 1,
"text": "Product sales have been disappointing for the past three quarters.",
"tones": [
{
"score": 0.771241,
"tone_id": "sadness",
"tone_name": "Sadness"
},
{
"score": 0.687768,
"tone_id": "analytical",
"tone_name": "Analytical"
}
]
},
{
"sentence_id": 2,
"text": "We have a competitive product, but we need to do a better job of selling it!",
"tones": [
{
"score": 0.506763,
"tone_id": "analytical",
"tone_name": "Analytical"
}
]
}
]
}
mydata = json_normalize(jsonfile['sentences_tone'])
mydata.head(3)
print(mydata)
tones_data = json_normalize(data=jsonfile['sentences_tone'], record_path='tones')
tones_data.head(3)
print(tones_data)
输出数据帧将为
sentence_id ... tones
0 0 ...[{'score': 0.801827, 'tone_id': 'analytical', ...
1 1 ...[{'score': 0.771241, 'tone_id': 'sadness', 'to...
2 2 ...[{'score': 0.506763, 'tone_id': 'analytical', ...
[3 rows x 3 columns]
score tone_id tone_name
0 0.801827 analytical Analytical
1 0.771241 sadness Sadness
2 0.687768 analytical Analytical
3 0.506763 analytical Analytical
此外,我还为您创建了REPL来更改输入并在浏览器中运行代码-https://repl.it/@aficionado/DarkturquoiseUnnaturalDistributeddatabase
请参阅此Kaggle链接以了解有关flattening JSON in Python using Pandas
的更多信息