Python Elasticsearch:使用search_exists的响应

时间:2015-03-18 21:03:49

标签: python elasticsearch

我正在尝试从文本文件中获取一个url列表,看看它们是否已存储在elasticsearch中。这是代码:

import fileinput
import sys
import urllib2
import os
from urlparse import urlparse
from elasticsearch import Elasticsearch

es = Elasticsearch()

for line_number, line in enumerate(fileinput.input('bangersandmash_items.csv', inplace=1)):
    if len(line) > 4:
            sys.stdout.write(line)


#open file to load URLs

with open('bangersandmash_items.csv') as urls:
    for line in urls:

        #strip out http:// as this seems to cause elasticsearch to return no results

        url = line.rstrip()
        prefix = 'http://'
        if url.startswith(prefix):
            url = url[len(prefix):]

        #query elasticsearch to see if url already exists in library's 'link' fied

        response = es.search_exists(index="websearch", doc_type="site", body={"query": {"match_phrase": {"link": url}}}, ignore=[400, 404])
            print url
            print response

            #Is url in library?

            if response == "{u'exists': true}":
                print url
                print "bingo!"
            else:
                print url
                print "nuthin."

它打印出第19-22行格式化的网址,但它似乎无法处理错误代码。第25和26行打印出来自elasticsearch的URL和响应。第28-33行似乎没有正确地对这些信息采取行动。对我在这里做错了什么的想法?

1 个答案:

答案 0 :(得分:0)

想出来。不得不调整if / else语句,以便弹性搜索的响应被读作字典中的字符串:

state = str(response['exists'])
               if state == 'True':
               print url
               print "bingo!"
               [etc].