我在postgres上有一个名为data
的jsonb结构,其中每一行(大约有300万行)看起来像这样:
[
{
"number": 100,
"key": "this-is-your-key",
"listr": "20 Purple block, THE-CITY, Columbia",
"realcode": "LA40",
"ainfo": {
"city": "THE-CITY",
"county": "Columbia",
"street": "20 Purple block",
"var_1": ""
},
"booleanval": true,
"min_address": "20 Purple block, THE-CITY, Columbia LA40"
},
.....
]
我想以最快的方式查询min_address
字段。在Django我尝试使用:
APModel.objects.filter(data__0__min_address__icontains=search_term)
但这需要很长时间才能完成(同样,“THE-CITY”是大写的,所以,我必须在这里使用icontains
。我尝试像这样放到rawsql:
cursor.execute("""\
SELECT * FROM "apmodel_ap_model"
WHERE ("apmodel_ap_model"."data"
#>> array['0', 'min_address'])
@> %s \
""",\
[json.dumps([{'min_address': search_term}])]
)
但这会引发我奇怪的错误,如:
LINE 4: @> '[{"min_address": "some lane"}]'
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
我想知道使用rawsql游标查询字段min_address
的最快方法是什么。
答案 0 :(得分:0)
最新答案,可能不再对OP有所帮助。另外我也不是Postgres / JSONB的专家,所以这可能是一个糟糕的主意。
给出此设置;
so49263641=# \d apmodel_ap_model;
Table "public.apmodel_ap_model"
Column | Type | Collation | Nullable | Default
--------+-------+-----------+----------+---------
data | jsonb | | |
so49263641=# select * from apmodel_ap_model ;
data
-------------------------------------------------------------------------------------------
[{"number": 1, "min_address": "Columbia"}, {"number": 2, "min_address": "colorado"}]
[{"number": 3, "min_address": " columbia "}, {"number": 4, "min_address": "California"}]
(2 rows)
以下查询将对象从data
数组“扩展”到各个行。然后,它将模式匹配应用于min_address
字段。
so49263641=# SELECT element->'number' as number, element->'min_address' as min_address
FROM apmodel_ap_model ap, JSONB_ARRAY_ELEMENTS(ap.data) element
WHERE element->>'min_address' ILIKE '%col%';
number | min_address
--------+---------------
1 | "Columbia"
2 | "colorado"
3 | " columbia "
(3 rows)
但是,我怀疑它将min_address值强制转换为文本后再进行模式匹配。
编辑:有关在索引JSONB数据中进行搜索https://stackoverflow.com/a/33028467/1284043
的一些很好的建议