CouchDB视图 - 按键阵列过滤和分组

时间:2018-04-04 00:55:06

标签: group-by couchdb views filtering

问题描述

我在CouchDB视图中有一组键,[doc.time, doc.address]。两者都不是唯一的。 doc.time是UNIX时间戳,doc.address是字符串。 reduce函数设置为_sum,因为每组键的唯一值是一个数字。

我想要的是按doc.time过滤,然后按doc.address对其余记录进行分组。如果我将doc.time作为第一个密钥,无论我指定为group_level,我似乎都无法按唯一地址分组。如果我先放doc.address,我似乎无法按时间过滤查询。

两个例子

查询:?group_level=1&startkey=[0,1230000000]&endkey=[{},1340000000]

第一个关键字:doc.address之前的doc.time

问题:不按时间过滤

代码:

rows: [
  {
    key: [ "1126GDuGLQTX3LFHHmjCctdn8WKDjn7QNA" ],
    value: 50
  },
  {
    key: [ "112AobLhjLJQ3LGqXFrsdnWMPqWCQqoiS6" ],
    value: 50
  }
]

查询:?group_level=1&startkey=[1230000000]&endkey=[1340000000,{}]

第一个关键字:doc.time之前的doc.address

问题:看不到,我没有按doc.address

分组

代码:

rows: [
  {
    key: [ 1231469665 ],
    value: 50
  },
  {
    key: [ 1231469744 ],
    value: 50
  }
]

1 个答案:

答案 0 :(得分:1)

你提到过:

  

...如果我将doc.time作为第一个密钥,无论我指定为group_level,我都无法按唯一地址分组...

查询参数group_level=NNth逗号上拆分字符串,并通过字符串匹配将左侧元素组合在一起。因此,当您的数组键如下所示:[doc.time, doc.address]时,您无法按address分组,而doc.address不在逗号的左侧

  

...如果我先放[doc.address, doc.time],我似乎无法按时间过滤查询...

如果数组键类似于:sample,请注意您在Map函数中发出数组键。您需要考虑CouchDB中数组键复合键以下几点

this reference上描述:

  

...首先注意事项和非常重要...来自javascript Map函数的数组输出...每个索引键都是字符串,并按字符排序字符串作为字符串,包括括号和逗号......

上述关于the reference的声明和解释对于复合键数组键的情况下CouchDB索引的工作方式有重大影响。

为了澄清,我们可以在{"time":"2011","address":"CT"} {"time":"2012","address":"CT"} ... {"time":"2011","address":"TX"} ... {"time":"2015","address":"TX"} ... {"time":"2014","address":"NY"} ... {"time":"2014","address":"CA"} {"time":"2015","address":"CA"} {"time":"2016","address":"CA"} 数据库上创建如下文档:

function (doc) {
  if(doc.time && doc.address){
    emit([doc.address, doc.time], null);
  }
}

我实现了这样的视图映射函数:

$ curl -k -X GET 'https://admin:****@192.168.1.106:6984/sample/_design/by_addr_time/_view/by_addr_time'
{"total_rows":25,"offset":0,"rows":[
{"id":"doc_0022","key":["CA","2014"],"value":null},
{"id":"doc_0023","key":["CA","2015"],"value":null},
{"id":"doc_0024","key":["CA","2016"],"value":null},
{"id":"doc_0000","key":["CT","2011"],"value":null},
{"id":"doc_0001","key":["CT","2012"],"value":null},
{"id":"doc_0002","key":["CT","2013"],"value":null},
{"id":"doc_0003","key":["CT","2014"],"value":null},
{"id":"doc_0004","key":["CT","2015"],"value":null},
{"id":"doc_0005","key":["CT","2016"],"value":null},
{"id":"doc_0014","key":["NY","2011"],"value":null},
{"id":"doc_0015","key":["NY","2012"],"value":null},
{"id":"doc_0016","key":["NY","2013"],"value":null},
{"id":"doc_0017","key":["NY","2014"],"value":null},
{"id":"doc_0018","key":["NY","2015"],"value":null},
{"id":"doc_0019","key":["NY","2016"],"value":null},
{"id":"doc_0020","key":["NY","2017"],"value":null},
{"id":"doc_0021","key":["NY","2018"],"value":null},
{"id":"doc_0006","key":["TX","2011"],"value":null},
{"id":"doc_0008","key":["TX","2012"],"value":null},
{"id":"doc_0007","key":["TX","2013"],"value":null},
{"id":"doc_0009","key":["TX","2014"],"value":null},
{"id":"doc_0010","key":["TX","2015"],"value":null},
{"id":"doc_0011","key":["TX","2016"],"value":null},
{"id":"doc_0012","key":["TX","2017"],"value":null},
{"id":"doc_0013","key":["TX","2018"],"value":null}
]}

目前,我没有使用任何 Reduce 功能,因为,让我们忽略任何分组减少并专注于普通简单的索引。上面的视图为索引生成以下键/值对:

doc.time

现在,我要进行查询以按?startkey=["AA","2017"]&endkey=["ZZ","2018"] 过滤视图。我的查询参数是:

time

我希望上述查询只返回20172018之间address字段的文档,这些文档的AA字段可以任意值,因为我从ZZ指定到curl,其中包含我数据库中的所有地址。我正在使用$ curl -k -X GET 'https://admin:****@192.168.1.106:6984/sample/_design/by_addr_time/_view/by_addr_time?startkey=\["AA","2017"\]&endkey=\["ZZ","2018"\]' {"total_rows":25,"offset":0,"rows":[ {"id":"doc_0022","key":["CA","2014"],"value":null}, {"id":"doc_0023","key":["CA","2015"],"value":null}, {"id":"doc_0024","key":["CA","2016"],"value":null}, {"id":"doc_0000","key":["CT","2011"],"value":null}, {"id":"doc_0001","key":["CT","2012"],"value":null}, {"id":"doc_0002","key":["CT","2013"],"value":null}, {"id":"doc_0003","key":["CT","2014"],"value":null}, {"id":"doc_0004","key":["CT","2015"],"value":null}, {"id":"doc_0005","key":["CT","2016"],"value":null}, {"id":"doc_0014","key":["NY","2011"],"value":null}, {"id":"doc_0015","key":["NY","2012"],"value":null}, {"id":"doc_0016","key":["NY","2013"],"value":null}, {"id":"doc_0017","key":["NY","2014"],"value":null}, {"id":"doc_0018","key":["NY","2015"],"value":null}, {"id":"doc_0019","key":["NY","2016"],"value":null}, {"id":"doc_0020","key":["NY","2017"],"value":null}, {"id":"doc_0021","key":["NY","2018"],"value":null}, {"id":"doc_0006","key":["TX","2011"],"value":null}, {"id":"doc_0008","key":["TX","2012"],"value":null}, {"id":"doc_0007","key":["TX","2013"],"value":null}, {"id":"doc_0009","key":["TX","2014"],"value":null}, {"id":"doc_0010","key":["TX","2015"],"value":null}, {"id":"doc_0011","key":["TX","2016"],"value":null}, {"id":"doc_0012","key":["TX","2017"],"value":null}, {"id":"doc_0013","key":["TX","2018"],"value":null} ]} 进行查询:

time

上述查询返回的响应似乎令人震惊。因为看起来它不会仅返回在20172018之间提交?startkey=["CT","2016"]&endkey=["TX","2011"] 的文档。这就是数组键的CouchDB索引如何工作。 CouchDB对数组键进行索引,好像整个数组都是一个字符串,包括数组的括号和逗号!如果您阅读the reference,它将启动有意义。

现在让我们更改查询:

$ curl -k -X GET 'https://admin:****@192.168.1.106:6984/sample/_design/by_addr_time/_view/by_addr_time?startkey=\["CT","2016"\]&endkey=\["TX","2011"\]'
{"total_rows":25,"offset":8,"rows":[
{"id":"doc_0005","key":["CT","2016"],"value":null},
{"id":"doc_0014","key":["NY","2011"],"value":null},
{"id":"doc_0015","key":["NY","2012"],"value":null},
{"id":"doc_0016","key":["NY","2013"],"value":null},
{"id":"doc_0017","key":["NY","2014"],"value":null},
{"id":"doc_0018","key":["NY","2015"],"value":null},
{"id":"doc_0019","key":["NY","2016"],"value":null},
{"id":"doc_0020","key":["NY","2017"],"value":null},
{"id":"doc_0021","key":["NY","2018"],"value":null},
{"id":"doc_0006","key":["TX","2011"],"value":null}
]}

上面的查询结果如下所示,根据我们的解释,这应该是有道理的:

doc.time

更新

  

...我想要的是按doc.address过滤,然后按t_red对其余记录进行分组......

那么,我们该怎么办?有一个很好的question and answer并提供了基本的想法。

不确定哪个想法最好,但我实现了这样一个想法:使用内置_count创建了一个名为function (doc) { if(doc.time && doc.address){ emit([doc.time, doc.address], null); } } 的视图,如下所示降低:

a_red

另外,我创建了一个名为_count的视图,内置function (doc) { if(doc.address && doc.time){ emit([doc.address, doc.time], null); } } reduce:

doc.time

然后我在 NodeJS 上开发了以下代码,以便在20122015之间查询doc.address,然后根据process.env.NODE_TLS_REJECT_UNAUTHORIZED = "0"; // Ignore rejection, becasue CouchDB SSL certificate is self-signed const fetch=require('node-fetch') // query "t_red" view/index fetch(`https://admin:****@192.168.1.106:6984/sample/_design/t_red/_view/t_red?group_level=2&startkey=["2012", "AA"]&endkey=["2015", "ZZ"]`, { method: 'GET', headers: { 'Content-Type': 'application/json', } }).then( res=>res.json() ).then(data=>{ let unique_addr=[] data.rows.map(row=>{ console.log('row.key-> ', row.key, ' row.value-> ', row.value) // console log is shown below: // // row.key-> [ '2012', 'CT' ] row.value-> 1 // row.key-> [ '2012', 'NY' ] row.value-> 1 // row.key-> [ '2012', 'TX' ] row.value-> 1 // row.key-> [ '2013', 'CT' ] row.value-> 1 // row.key-> [ '2013', 'NY' ] row.value-> 1 // row.key-> [ '2013', 'TX' ] row.value-> 1 // row.key-> [ '2014', 'CA' ] row.value-> 1 // row.key-> [ '2014', 'CT' ] row.value-> 1 // row.key-> [ '2014', 'NY' ] row.value-> 1 // row.key-> [ '2014', 'TX' ] row.value-> 1 // row.key-> [ '2015', 'CA' ] row.value-> 1 // row.key-> [ '2015', 'CT' ] row.value-> 1 // row.key-> [ '2015', 'NY' ] row.value-> 1 // row.key-> [ '2015', 'TX' ] row.value-> 1 if(unique_addr.indexOf(row.key[1])==-1){ // Push unique addresses into an array unique_addr.push(row.key[1]) } }) console.log(unique_addr) // Console log is shown below: // // [ 'CT', 'NY', 'TX', 'CA' ] return unique_addr }).then(unique_addr=>{ // Group the unique addresses let group_by_address=unique_addr.map(addr=>{ // For each unique address, do a query of "a_red" view/index return fetch(`https://admin:****@192.168.1.106:6984/sample/_design/a_red/_view/a_red?group_level=2&startkey=["${addr}","2012"]&endkey=["${addr}","2015"]`, { method: 'GET', headers: { 'Content-Type': 'application/json', } }).then( res=>res.json() ).then(data=>{ data.rows.map(row=>{console.log('row.key-> ', row.key, ' row.value-> ', row.value)}) // Console logs related to this section of code are shown below //row.key-> [ 'CA', '2014' ] row.value-> 1 //row.key-> [ 'CA', '2015' ] row.value-> 1 //row.key-> [ 'NY', '2012' ] row.value-> 1 //row.key-> [ 'NY', '2013' ] row.value-> 1 //row.key-> [ 'NY', '2014' ] row.value-> 1 //row.key-> [ 'NY', '2015' ] row.value-> 1 //row.key-> [ 'CT', '2012' ] row.value-> 1 //row.key-> [ 'CT', '2013' ] row.value-> 1 //row.key-> [ 'CT', '2014' ] row.value-> 1 //row.key-> [ 'CT', '2015' ] row.value-> 1 //row.key-> [ 'TX', '2012' ] row.value-> 1 //row.key-> [ 'TX', '2013' ] row.value-> 1 //row.key-> [ 'TX', '2014' ] row.value-> 1 //row.key-> [ 'TX', '2015' ] row.value-> 1 let obj={} obj[addr]=data.rows.length // This object contains unique address and its corresponding frequency in above query return obj }).catch(err=>{ console.log('err-> ', err) }) }) return group_by_address }).then(group_by_address=>{ group_by_address.map(group=>{ group.then(()=>{ console.log('Grouped by address-> ', group) // Console logs related this section of code are shown below: //Grouped by address-> Promise { { CA: 2 } } //Grouped by address-> Promise { { NY: 4 } } //Grouped by address-> Promise { { CT: 4 } } //Grouped by address-> Promise { { TX: 4 } } }) }) }).catch(err=>{ console.log('err-> ', err) }) 对结果进行分组,控制台日志在代码中显示为注释。我希望这段代码会有所帮助(不要混淆!):

prevent events