我想翻译python方法以将特定术语从抓取的网站转换为Elasticsearch查询。
我正在从事网络爬虫和Elasticsearch(及其他工作..)方面的实习,对这个领域(以及一般编程领域)我是一个新手
我的任务是删除国家/地区代码,然后进行查询以使用其他国家/地区代码获取国家/地区代码,例如:
澳大利亚的2个字符的国家/地区代码是:“ AU” 它的三个字符的国家/地区代码是:“ AUS”
因此,通过指定“ AU”,我想使用“ AUS”代码。
为此,我抓取了所有国家/地区的列表代码,并制作了python代码以获取此结果,下面是一个示例:
NameError: name 'get_env_variable' is not defined
python manage.py runserver
Performing system checks...
System check identified no issues (0 silenced).
Unhandled exception in thread started by <function check_errors.<locals>.wrapper at 0x0495CA98>
Traceback (most recent call last):
File "C:\Program Files (x86)\Python37-32\lib\site-packages\django\db\backends\base\base.py", line 216, in ensure_connection
self.connect()
File "C:\Program Files (x86)\Python37-32\lib\site-packages\django\db\backends\base\base.py", line 194, in connect
self.connection = self.get_new_connection(conn_params)
File "C:\Program Files (x86)\Python37-32\lib\site-packages\django\db\backends\postgresql\base.py", line 178, in get_new_connection
connection = Database.connect(**conn_params)
File "C:\Program Files (x86)\Python37-32\lib\site-packages\psycopg2\__init__.py", line 130, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Program Files (x86)\Python37-32\lib\site-packages\django\utils\autoreload.py", line 225, in wrapper
fn(*args, **kwargs)
File "C:\Program Files (x86)\Python37-32\lib\site-packages\django\core\management\commands\runserver.py", line 120, in inner_run
self.check_migrations()
File "C:\Program Files (x86)\Python37-32\lib\site-packages\django\core\management\base.py", line 442, in check_migrations
executor = MigrationExecutor(connections[DEFAULT_DB_ALIAS])
File "C:\Program Files (x86)\Python37-32\lib\site-packages\django\db\migrations\executor.py", line 18, in __init__
self.loader = MigrationLoader(self.connection)
File "C:\Program Files (x86)\Python37-32\lib\site-packages\django\db\migrations\loader.py", line 49, in __init__
self.build_graph()
File "C:\Program Files (x86)\Python37-32\lib\site-packages\django\db\migrations\loader.py", line 212, in build_graph
self.applied_migrations = recorder.applied_migrations()
File "C:\Program Files (x86)\Python37-32\lib\site-packages\django\db\migrations\recorder.py", line 61, in applied_migrations
if self.has_table():
File "C:\Program Files (x86)\Python37-32\lib\site-packages\django\db\migrations\recorder.py", line 44, in has_table
return self.Migration._meta.db_table in self.connection.introspection.table_names(self.connection.cursor())
File "C:\Program Files (x86)\Python37-32\lib\site-packages\django\db\backends\base\base.py", line 255, in cursor
return self._cursor()
File "C:\Program Files (x86)\Python37-32\lib\site-packages\django\db\backends\base\base.py", line 232, in _cursor
self.ensure_connection()
File "C:\Program Files (x86)\Python37-32\lib\site-packages\django\db\backends\base\base.py", line 216, in ensure_connection
self.connect()
File "C:\Program Files (x86)\Python37-32\lib\site-packages\django\db\utils.py", line 89, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "C:\Program Files (x86)\Python37-32\lib\site-packages\django\db\backends\base\base.py", line 216, in ensure_connection
self.connect()
File "C:\Program Files (x86)\Python37-32\lib\site-packages\django\db\backends\base\base.py", line 194, in connect
self.connection = self.get_new_connection(conn_params)
File "C:\Program Files (x86)\Python37-32\lib\site-packages\django\db\backends\postgresql\base.py", line 178, in get_new_connection
connection = Database.connect(**conn_params)
File "C:\Program Files (x86)\Python37-32\lib\site-packages\psycopg2\__init__.py", line 130, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError
所以基本上我想将上面的代码转换为请求,然后在网页上实现以供内部使用
我是初学者,请尽可能明确。
答案 0 :(得分:0)
假设为文档建立索引时使用了默认的动态映射,则所有strings
都应同时映射为text
类型和keyword
类型。因此,在term
映射上进行简单的keyword
查询就可以得到您想要的结果。
例如,使用默认设置创建索引的步骤很简单:
PUT countries-codes
为提供的文档编制索引将如下所示:
POST countries-codes/event
{
"name": "Albanie",
"alpha_2": "AL",
"alpha_3": "ALB",
"num": "8"
}
现在,我们可以查看索引的映射,以了解Elasticsearch如何在内部映射字段:
GET countries-codes/_mapping
结果:
{
"countries-codes" : {
"mappings" : {
"event" : {
"properties" : {
"alpha_2" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"alpha_3" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"num" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
现在,我们只需对2个字符的国家/地区代码的term
映射进行一次keyword
查询,我们将获得一个表示匹配项的文档(或者在某种程度上存在多个匹配项的情况下, ,代表这些匹配项的所有文档):
GET countries-codes/_search
{
"query": {
"bool": {
"filter": {
"term": {
"alpha_2.keyword": "AL"
}
}
}
}
}
请注意,这是一个过滤的查询,因为您对计分不感兴趣。简而言之,筛选器上下文将比查询上下文快,因此请尽可能使用它。有关更多信息,请参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html
这将产生您之前发布的文档,位于hits
返回数组中:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.0,
"hits" : [
{
"_index" : "countries-codes",
"_type" : "event",
"_id" : "qGDmEWoBqkB-aMRpdfvt",
"_score" : 0.0,
"_source" : {
"name" : "Albanie",
"alpha_2" : "AL",
"alpha_3" : "ALB",
"num" : "8"
}
}
]
}
}
任何提交的不匹配项都会产生一个空的hits数组。在客户端,您可以仅解析所需的元素。如果您有非常大的文档或要退回大量文档,则需要查看source filtering
-https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-source-filtering.html
例如:
GET countries-codes/_search
{
"_source": "alpha_3",
"query": {
"bool": {
"filter": {
"term": {
"alpha_2.keyword": "AL"
}
}
}
}
}
在返回的匹配对象中,您只会注意到所需的结果是从文档中返回的:
"hits" : {
"total" : 1,
"max_score" : 0.0,
"hits" : [
{
"_index" : "countries-codes",
"_type" : "event",
"_id" : "qGDmEWoBqkB-aMRpdfvt",
"_score" : 0.0,
"_source" : {
"alpha_3" : "ALB"
}
}
]
}
所有示例均使用开发工具/简单的API调用显示。由于您使用的是Python,请查看正式维护的Elasticsearch库:
Elasticsearch DSL-建立在较低级别的Elasticsearch-Py之上-https://elasticsearch-dsl.readthedocs.io/en/latest/
Elasticsearch-Py-https://elasticsearch-py.readthedocs.io/en/master/