Question

Bellow你看到我用来通过IBM的API收集一些数据的代码。但是我在通过python将输出保存到csv表时遇到了一些问题。

这些是我想要的列（及其值）：

emotion__document__emotion__anger   emotion__document__emotion__joy
emotion__document__emotion__sadness emotion__document__emotion__fear    
emotion__document__emotion__disgust sentiment__document__score  
sentiment__document__label  language    entities__relevance 
entities__text  entities__type  entities__count concepts__relevance
concepts__text  concepts__dbpedia_resource  usage__text_characters
usage__features usage__text_units   retrieved_url

这是我用来收集数据的代码：

response = natural_language_understanding.analyze(
  url=url,
  features=[
  Features.Emotion(),
  Features.Sentiment(),
  Features.Concepts(limit=1),
  Features.Entities(limit=1)
          ]
  )


data = json.load(response)
rows_list = []
cols = []

for ind,row in enumerate(data):

    if ind == 0:
        cols.append(["usage__{}".format(i) for i in row["usage"].keys()])
        cols.append(["emotion__document__emotion__{}".format(i) for i in row["emotion"]["document"]["emotion"].keys()])
        cols.append(["sentiment__document__{}".format(i) for i in row["sentiment"]["document"].keys()])
        cols.append(["concepts__{}".format(i) for i in row["concepts"].keys()])
        cols.append(["entities__{}".format(i) for i in row["entities"].keys()])
        cols.append(["retrieved_url"])

    d = OrderedDict()


    d.update(row["usage"])
    d.update(row["emotion"]["document"]["emotion"])
    d.update(row["sentiment"]["document"])
    d.update(row["concepts"])
    d.update(row["entities"])
    d.update({"retrieved_url":row["retrieved_url"]})

    rows_list.append(d)


df = pd.DataFrame(rows_list)
df.columns = [i for subitem in cols for i in subitem]
df.to_csv("featuresoutput.csv", index=False)

更改

cols.append(["concepts__{}".format(i) for i in row["concepts"][0].keys()])
cols.append(["entities__{}".format(i) for i in row["entities"][0].keys()])

没有解决问题

Answer 1

此行为数据分配字符串：

data=(json.dumps(datas, indent=2))

所以在这里你迭代字符串的字符：

for ind,row in enumerate(data):

在这种情况下，row将是一个字符串，而不是字典。因此，例如，row["usage"]会在这种情况下给你这样的错误。

也许您想迭代datas？

更新

代码还有一些其他问题，例如：

cols.append(["concepts__{}".format(i) for i in row["concepts"].keys()])

在这种情况下，您希望row["concepts"][0].keys()获取第一个元素的键，因为row["concepts"]是一个数组。

我对熊猫不是很熟悉，但我建议你看一下pandas中包含的json_normalize，它可以帮助平展JSON结构。您可能面临的一个问题是包含文档数组的概念和实体。这意味着您必须至少max(len(concepts), len(entities))次包含同一文档。

Answer 2

如果从API获取，则响应将采用json格式。您可以通过以下方式将其输出到csv：

import csv, json
response = the json response you get from the API
attributes = [emotion__document__emotion__anger, emotion__document__emotion__joy.....attributes you want]
data = json.load(response)
with open('output.csv', 'w') as f:
    writer = csv.writer(f, delimiter=',')
    for attribute in attributes:   
        writer.writerow(data[attribute][0])
    f.close()

确保数据在dict但不是字符串，Python 3.6应该返回一个字典。打印几行以查看所需数据的存储方式。

Python Response API JSON到CSV表

2 个答案:

更新