我正在尝试从文本文件中获取一个url列表,看看它们是否已存储在elasticsearch中。这是代码:
import fileinput
import sys
import urllib2
import os
from urlparse import urlparse
from elasticsearch import Elasticsearch
es = Elasticsearch()
for line_number, line in enumerate(fileinput.input('bangersandmash_items.csv', inplace=1)):
if len(line) > 4:
sys.stdout.write(line)
#open file to load URLs
with open('bangersandmash_items.csv') as urls:
for line in urls:
#strip out http:// as this seems to cause elasticsearch to return no results
url = line.rstrip()
prefix = 'http://'
if url.startswith(prefix):
url = url[len(prefix):]
#query elasticsearch to see if url already exists in library's 'link' fied
response = es.search_exists(index="websearch", doc_type="site", body={"query": {"match_phrase": {"link": url}}}, ignore=[400, 404])
print url
print response
#Is url in library?
if response == "{u'exists': true}":
print url
print "bingo!"
else:
print url
print "nuthin."
它打印出第19-22行格式化的网址,但它似乎无法处理错误代码。第25和26行打印出来自elasticsearch的URL和响应。第28-33行似乎没有正确地对这些信息采取行动。对我在这里做错了什么的想法?
答案 0 :(得分:0)
想出来。不得不调整if / else语句,以便弹性搜索的响应被读作字典中的字符串:
state = str(response['exists'])
if state == 'True':
print url
print "bingo!"
[etc].