在postgres中有效地搜索json对象数组

时间:2013-12-05 05:18:48

标签: json postgresql

我有以下json:

{"metadata"=>{"result_type"=>"recent", "iso_language_code"=>"en"},
 "created_at"=>"Thu Feb 28 10:45:15 +0000 2013",
 "id"=>307079006698745857,
 "id_str"=>"307079006698745857",
 "text"=>
  "@borkdude @Rebel_Labs there are 7500+ people on the mailing list, too: http://t.co/pswvhvqJPE",
 "source"=>
  "<a href=\"http://tapbots.com/software/tweetbot/mac\" rel=\"nofollow\">Tweetbot for Mac</a>",
 "truncated"=>false,
 "in_reply_to_status_id"=>307049603952414720,
 "in_reply_to_status_id_str"=>"307049603952414720",
 "in_reply_to_user_id"=>15446348,
 "in_reply_to_user_id_str"=>"15446348",
 "in_reply_to_screen_name"=>"borkdude",
 "user"=>
  {"id"=>13033522,
   "id_str"=>"13033522",
   "name"=>"Michael Klishin",
   "screen_name"=>"michaelklishin",
   "location"=>"",
   "description"=>
    "Multilingual. Curious about how things work. Software, concurrency, OSS. Data, urbanism. Trance, dubstep, lolgifs. @ClojureWerkz mastermind, ex-@travisci core.",
   "url"=>"http://bit.ly/nTTvfC",
   "entities"=>
    {"url"=>
      {"urls"=>
        [{"url"=>"http://bit.ly/nTTvfC",
          "expanded_url"=>nil,
          "indices"=>[0, 20]}]},
     "description"=>{"urls"=>[]}},
   "protected"=>false,
   "followers_count"=>805,
   "friends_count"=>215,
   "listed_count"=>39,
   "created_at"=>"Mon Feb 04 04:11:13 +0000 2008",
   "favourites_count"=>61,
   "utc_offset"=>14400,
   "time_zone"=>"Moscow",
   "geo_enabled"=>false,
   "verified"=>false,
   "statuses_count"=>5833,
   "lang"=>"es",
   "contributors_enabled"=>false,
   "is_translator"=>false,
   "profile_background_color"=>"C0DEED",
   "profile_background_image_url"=>
    "http://a0.twimg.com/images/themes/theme1/bg.png",
   "profile_background_image_url_https"=>
    "https://si0.twimg.com/images/themes/theme1/bg.png",
   "profile_background_tile"=>false,
   "profile_image_url"=>
    "http://a0.twimg.com/profile_images/3190382095/8485cc3e3534ffd2eef41854204d34e4_normal.jpeg",
   "profile_image_url_https"=>
    "https://si0.twimg.com/profile_images/3190382095/8485cc3e3534ffd2eef41854204d34e4_normal.jpeg",
   "profile_link_color"=>"0084B4",
   "profile_sidebar_border_color"=>"C0DEED",
   "profile_sidebar_fill_color"=>"DDEEF6",
   "profile_text_color"=>"333333",
   "profile_use_background_image"=>true,
   "default_profile"=>true,
   "default_profile_image"=>false,
   "following"=>nil,
   "follow_request_sent"=>nil,
   "notifications"=>nil},
 "geo"=>nil,
 "coordinates"=>nil,
 "place"=>nil,
 "contributors"=>nil,
 "retweet_count"=>0,
 "entities"=>
  {"hashtags"=>[],
   "urls"=>
    [{"url"=>"http://t.co/pswvhvqJPE",
      "expanded_url"=>"http://groups.google.com/group/clojure",
      "display_url"=>"groups.google.com/group/clojure",
      "indices"=>[71, 93]}],
   "user_mentions"=>
    [{"screen_name"=>"borkdude",
      "name"=>"Michiel Borkent",
      "id"=>15446348,
      "id_str"=>"15446348",
      "indices"=>[0, 9]},
     {"screen_name"=>"Rebel_Labs",
      "name"=>"Rebel Labs",
      "id"=>904047793,
      "id_str"=>"904047793",
      "indices"=>[10, 21]}]},
 "favorited"=>false,
 "retweeted"=>false,
 "possibly_sensitive"=>false}

这存储在使用:

创建的postgres表中

创建表格推文(id bigint,tweet json,约束id主键(id));

查找所有在tweet中具有对象的条目的最有效方法 - &gt;'entities' - &gt;'user_mentions'具有'screen_name'=='SOME_VALUE'。

1 个答案:

答案 0 :(得分:2)

我在Index for finding an element in a JSON array找到了一些灵​​感。

您需要做的是:

  • 创建一个不可变函数来生成GIN索引:

    mf = #CREATE或REPLACE FUNCTION json_val_arr(_j json,_key text) mf-#RETURNS text [] AS mf-#$$ mf $#SELECT array_agg(elem-&gt;&gt; _key) mf $#FROM json_array_elements(_j)AS x(elem) mf $#$$ mf- #LANGUAGE sql IMMUTABLE; 创造功能

  • 使用以下函数创建GIN索引:

    mf =#CREATE INDEX entities_user_mentions_screen_name ON“1”.tweets USING GIN(json_val_arr(tweet-&gt;'entities' - &gt;'user_mentions','screen_name'));

  • 查询:

    mf =#select id from“1”.tweets where'{“Rebel_Labs”}':: text []&lt; @(json_val_arr(tweet-&gt;'entities' - &gt;'user_mentions','screen_name' ));

    id

    307079006698745857  307063068662321152  307049603952414720  306869345110351872  306436498360774656  308672668985593856  308645862236643328  309979789794619392 (8行)

    时间:8,356 ms