我是MongoDB的新手,请多多包涵。我有一个看起来像这样的CSV文件example.csv
:
Sample,Chromosome,Position,Reference,Mutation,ReadDepth
testfile_snp,chr1,69511,A,G,10
testfile_snp,chr1,924024,C,G,12
testfile_snp,chr1,924533,A,G,13
testfile_snp,chr1,942451,T,C,22
testfile_snp,chr1,946247,G,A,44
testfile_snp,chr1,952421,A,G,32
testfile_snp,chr1,953259,T,C,37
testfile_snp,chr1,953279,T,C,23
testfile_snp,chr1,961945,G,C,40
testfile_snp,chr1,966227,C,G,35
我有很多文件,每个文件有大约25k行。我想查询MongoDB中的每一行。在我的数据库中,Sample,Chromosome,Position,Reference,Mutation
被索引为compound indexes
。我试图四处寻找解决方案,发现的唯一相关的是下面的thread。我可以使用以下命令将CSV格式更改为查询:
gawk -i inplace -F',' '{print "db.TestCollection.find({\"Sample\": \"" $1 "\", \"Chromosome\": \"" $2 "\", \"Position\": " $3 ", \"Reference\": \"" $4 "\", \"Mutation\": \"" $5 "\"})"}' example.csv
sed -i "1s/.*/use TestDatabase/" example.csv
mv example.csv example.js
它将输出:
use TestDatabase
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 69511, "Reference": "A", "Mutation": "G"})
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 924024, "Reference": "C", "Mutation": "G"})
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 924533, "Reference": "A", "Mutation": "G"})
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 942451, "Reference": "T", "Mutation": "C"})
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 946247, "Reference": "G", "Mutation": "A"})
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 952421, "Reference": "A", "Mutation": "G"})
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 953259, "Reference": "T", "Mutation": "C"})
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 953279, "Reference": "T", "Mutation": "C"})
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 961945, "Reference": "G", "Mutation": "C"})
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 966227, "Reference": "C", "Mutation": "G"})
然后我可以使用此文件将其提供给MongoDB:
mongo < example.js
当前,这就是我到目前为止查询每一行的方式。但是,我发现了另一个thread,可以在其中使用IN
运算符进行批量查询。问题在于,它在给定的所有字段中的行为都为OR
:
use TestDatabase
db.TestCollection({"Sample": { $in : ["testfile_snp", "sv37213_hg38"] }, "Chromosome": "chr1", "Position": { $in : [69270,182585422]}, "Reference" : {$in : ["A", "C"]}, "Mutation" : {$in : ["G", "T"]} } )
将给出:
MongoDB shell version v4.0.8
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("fb07f25a-3a4f-4c32-bd4e-70f3c3129435") }
MongoDB server version: 4.0.8
switched to db TestDatabase
{ "_id" : ObjectId("5ca47c1e0953f323b3b9cac5"), "Sample" : "sv37213_hg38", "Chromosome" : "chr1", "Position" : 69270, "Reference" : "A", "Mutation" : "G", "ReadDepth" : 19 }
{ "_id" : ObjectId("5ca47c1e0953f323b3b9e10f"), "Sample" : "sv37213_hg38", "Chromosome" : "chr1", "Position" : 182585422, "Reference" : "C", "Mutation" : "T", "ReadDepth" : 66 }
{ "_id" : ObjectId("5ca47bca0953f323b39019b1"), "Sample" : "test-exome-1_hg38", "Chromosome" : "chr1", "Position" : 69270, "Reference" : "A", "Mutation" : "G", "ReadDepth" : 17 }
bye
如您所见,此查询返回2个sv37213_hg38
的文档,这不是我所希望的。我只希望打印位置182585422
。
mongo中是否有任何功能可以批量查询文件的全部内容,还是我必须对每一行都进行查询?
答案 0 :(得分:0)
您可以使用$or
而不是使用pi@raspberrypi:~ $ valgrind ./a.out
==4083== Memcheck, a memory error detector
==4083== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4083== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==4083== Command: ./a.out
==4083==
a[0]: 80
Status of test_motor2: 1
==4083==
==4083== HEAP SUMMARY:
==4083== in use at exit: 0 bytes in 0 blocks
==4083== total heap usage: 2 allocs, 2 frees, 1,160 bytes allocated
==4083==
==4083== All heap blocks were freed -- no leaks are possible
==4083==
==4083== For counts of detected and suppressed errors, rerun with: -v
==4083== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 3)
并简单地将您最初进行的原始查询一一列出。
$in