从IBM Watson API python迭代JSON结果时获取错误

时间:2017-12-30 18:53:57

标签: python json loops jinja2 ibm-watson

我正在使用IBM Watson的自然语言理解API从URL中获取关键字和实体。我想迭代JSON响应以获取所有关键字和实体,并将它们填充到我的results.html文件中。

我试图遍历application.py文件和使用jinja的results.html文件中的结果。

helpers.py文件返回一个json.dump并将其发送到我的application.py文件,以便我可以遍历结果。

但是,我收到以下错误:

TypeError: string indices must be integers

我查了json.dump vs json.load以及字符串和字典来帮助解决这个问题,但我无法让代码工作。如果您需要更多信息,请告诉我。我需要在今年年底之前解决这个问题。先感谢您。

这是我的applications.py文件

@app.route("/URL", methods=["GET", "POST"])
def URL():
"""Analyze URL."""

 # if user reached route via POST (as by submitting a form via POST)
if request.method == "POST":

    # if nothing was entered return apology
    if not request.form.get("URL"):
        return apology("please enter a URL")
    URL = request.form.get("URL")

    # analyze URL using analyze function in helpers.py
    results = analyze(request.form.get("URL"))

    for item in results:
        keywords = item["keywords"]["text"]
        entities = item["entities"]["text"]

    return render_template("results.html", results=results, URL=URL) 

    # check if URL is valid
    if not results:
        return apology("this is not a valid URL")

else:
    return render_template("url.html")

这是helpers.py文件。

def analyze(URL):

natural_language_understanding = NaturalLanguageUnderstandingV1(
    version='2017-02-27',
    username='MUSTGETYOURUSERNAME',
    password='MUSTGETYOURPASSWORD')

response = natural_language_understanding.analyze(
    url=URL,
    features=Features(entities=EntitiesOptions(), keywords=KeywordsOptions()))

return (json.dumps(response, indent=2))

以下是使用jinja的results.html文件:

{% extends "layout.html" %}

{% block title %}
Results
{% endblock %}

{% block main %}
        <h2>Powered by IBM Watson's AI to recommend your #'s and @'s 
for tweeting</h2>
        <p>{{URL}}</p>
         {% for item in results %}
            <tr>
                <td>{{ item.keywords }}</td>
                <td>{{ item.entities }}</td>
            </tr>
        {% endfor %}

        <a class="twitter-share-button" 
href="https://twitter.com/intent/tweet">Tweet</a>
{% endblock %}

以下是输出结果:

[
  {
    "text": "Android apps",
    "relevance": 0.926516
  },
  {
    "text": "Chrome OS",
    "relevance": 0.878045
  },
  {
    "text": "Sorry Android fanboys",
    "relevance": 0.696885
  },
  {
    "text": "Android tablet",
    "relevance": 0.695471
  },
  {
    "text": "absolutely wonderful Android",
    "relevance": 0.672889
  },
  {
    "text": "Chrome OS beta",
    "relevance": 0.626619
  },
  {
    "text": "Android Police",
    "relevance": 0.592994
  },
  {
    "text": "Chrome OS devices",
    "relevance": 0.566831
  },
  {
    "text": "count Android",
    "relevance": 0.563911
  },
  {
    "text": "dominant Google OS",
    "relevance": 0.553724
  },
  {
    "text": "Chrome Unboxed",
    "relevance": 0.540076
  },
  {
    "text": "overall tablet sales",
    "relevance": 0.511826
  },
  {
    "text": "inexpensive Google rival",
    "relevance": 0.498259
  },
  {
    "text": "half incremental improvements",
    "relevance": 0.468663
  },
  {
    "text": "standard operating procedure",
    "relevance": 0.45946
  },
  {
    "text": "uncommon Chromebook form",
    "relevance": 0.456969
  },
  {
    "text": "content consumption machines",
    "relevance": 0.451775
  },
  {
    "text": "absolute best pieces",
    "relevance": 0.450763
  },
  {
    "text": "content creation ones",
    "relevance": 0.450345
  },
  {
    "text": "rich new fusion",
    "relevance": 0.446127
  },
  {
    "text": "Amazon Fire tablet",
    "relevance": 0.444685
  },
  {
    "text": "selling tablet",
    "relevance": 0.444241
  },
  {
    "text": "tablet operating",
    "relevance": 0.440434
  },
  {
    "text": "Google Pixelbook",
    "relevance": 0.440007
  },
  {
    "text": "Google store",
    "relevance": 0.439719
  },
  {
    "text": "cheap tablets",
    "relevance": 0.408395
  },
  {
    "text": "immortal highlander",
    "relevance": 0.404233
  },
  {
    "text": "disparate OSes",
    "relevance": 0.401626
  },
  {
    "text": "laptop space",
    "relevance": 0.40117
  },
  {
    "text": "detachable two-in-one",
    "relevance": 0.396257
  },
  {
    "text": "pleasant surprises",
    "relevance": 0.394027
  },
  {
    "text": "additional oomph",
    "relevance": 0.393127
  },
  {
    "text": "Samsung",
    "relevance": 0.391534
  },
  {
    "text": "flashy Chromebook",
    "relevance": 0.391359
  },
  {
    "text": "sleek Chromebook",
    "relevance": 0.390035
  },
  {
    "text": "smaller devices",
    "relevance": 0.389106
  },
  {
    "text": "operating systems",
    "relevance": 0.388958
  },
  {
    "text": "new feature",
    "relevance": 0.388395
  },
  {
    "text": "true multitasking",
    "relevance": 0.388097
  },
  {
    "text": "tablet-like device",
    "relevance": 0.387175
  },
  {
    "text": "two-in-one Chromebook",
    "relevance": 0.385518
  },
  {
    "text": "nightmare fuel",
    "relevance": 0.385284
  },
  {
    "text": "mouse-first OS—not",
    "relevance": 0.385193
  },
  {
    "text": "parallel tasks",
    "relevance": 0.381923
  },
  {
    "text": "budget device",
    "relevance": 0.380932
  },
  {
    "text": "iPad",
    "relevance": 0.35313
  },
  {
    "text": "news",
    "relevance": 0.333007
  },
  {
    "text": "strides",
    "relevance": 0.319667
  },
  {
    "text": "iOS",
    "relevance": 0.318235
  },
  {
    "text": "thanks",
    "relevance": 0.316534
  }
]

[
  {
    "type": "Company",
    "text": "Google",
    "relevance": 0.385564,
    "disambiguation": {
      "subtype": [
        "AcademicInstitution",
        "AwardPresentingOrganization",
        "OperatingSystemDeveloper",
        "ProgrammingLanguageDeveloper",
        "SoftwareDeveloper",
        "VentureFundedCompany"
      ],
      "name": "Google",
      "dbpedia_resource": "http://dbpedia.org/resource/Google"
    },
    "count": 9
  },
  {
    "type": "Company",
    "text": "Samsung",
    "relevance": 0.204475,
    "disambiguation": {
      "subtype": [],
      "name": "Samsung",
      "dbpedia_resource": "http://dbpedia.org/resource/Samsung"
    },
    "count": 4
  },
  {
    "type": "Location",
    "text": "Chromebooks",
    "relevance": 0.129986,
    "disambiguation": {
      "subtype": [
        "City"
      ]
    },
    "count": 2
  },
  {
    "type": "Company",
    "text": "Amazon",
    "relevance": 0.119948,
    "disambiguation": {
      "subtype": [],
      "name": "Amazon.com",
      "dbpedia_resource": "http://dbpedia.org/resource/Amazon.com"
    },
    "count": 2
  },
  {
    "type": "Location",
    "text": "US",
    "relevance": 0.109124,
    "disambiguation": {
      "subtype": [
        "Region",
        "AdministrativeDivision",
        "GovernmentalJurisdiction",
        "FilmEditor",
        "Country"
      ],
      "name": "United States",
      "dbpedia_resource": "http://dbpedia.org/resource/United_States"
    },
    "count": 1
  },
  {
    "type": "Company",
    "text": "Apple",
    "relevance": 0.108271,
    "disambiguation": {
      "subtype": [
        "Brand",
        "OperatingSystemDeveloper",
        "ProcessorManufacturer",
        "ProgrammingLanguageDesigner",
        "ProgrammingLanguageDeveloper",
        "ProtocolProvider",
        "SoftwareDeveloper",
        "VentureFundedCompany",
        "VideoGameDeveloper",
        "VideoGamePublisher"
      ],
      "name": "Apple Inc.",
      "dbpedia_resource": "http://dbpedia.org/resource/Apple_Inc."
    },
    "count": 1
  },
  {
    "type": "Quantity",
    "text": "$500",
    "relevance": 0.0746897,
    "count": 1
  },
  {
    "type": "Quantity",
    "text": "$50",
    "relevance": 0.0746897,
    "count": 1
  }
]

1 个答案:

答案 0 :(得分:0)

这里有一些混乱。您没有特别好地处理Watson API响应,而且您似乎也误解了此响应的形状。

您的analyze函数处理来自Watson API调用的响应。 Watson API有助于将从服务器返回的JSON响应解析为Python对象,例如列表和dicts。但是,您的代码然后调用json.dumps将其转换回字符串。通过这种方式,您可以撤消Watson API为您完成的一些工作。不要在json.dumps上致电response,只需返回response即可。

(我猜你从官方的IBM文档here中得到了这个:它与json.dumps有相同的调用,但Python代码示例仅用于演示目的。)< / p>

这解释了您获得的错误:results因此是一个字符串,因此当您使用for item in results进行迭代时,每个item都是1个字符的字符串。

但是,进行此更改不足以使代码正常工作。接下来,我们必须查看发生错误的循环,因为它仍然存在问题:

    for item in results:
        keywords = item["keywords"]["text"]
        entities = item["entities"]["text"]

此代码将results视为列表,列表中的每个item都是包含keywords属性和entities属性的字典。换句话说,它假设数据类似于以下内容:

[
    {
        "keywords": { "text": "abc123", ... },
        "entities": { "text": "def456", ... },
        ...
    },
    {
        "keywords": { "text": "cba321", ... },
        "entities": { "text": "fed654", ... },
        ...
    },
    ...
]

但是,如果您仔细查看the IBM documentation(与上面链接的页面相同的页面),响应将以不同的形式返回。看起来更像是这样:

{
    "keywords": [
        { "text": "abc123", ... },
        { "text": "def456", ... }
    ],
    "entities": [
        { "text": "ghi789", ... }
    ],
    ...
}

特别是,keywordsentities是顶级对象/字典下的单独列表,它们的长度可能不同。

而不是上面的循环,你可能想要更像以下内容:

    for item in results["keywords"]:
        keyword_text = item["text"]

    for item in results["entities"]:
        entity_text = item["text"]

但是,我不确定你的原始循环应该做什么:它会从你的Watson响应中获取数据,然后对它取出的数据不做任何事情。

您还需要修改Jinja模板以包含两个单独的循环。我会留给你的。

最后,您编写以下代码:

    return render_template("results.html", results=results, URL=URL) 

    # check if URL is valid
    if not results:
        return apology("this is not a valid URL")

在我们返回后检查网址是否有效为时已晚!检查(if not results)无法访问,永远不会运行。将此检查移至行results = analyze(...)

之后