如何使用awk读取csv并将数据存储到mongoDB中

时间:2017-02-10 10:23:36

标签: bash mongodb awk gawk

我有2个csv文件(每个文件10gb)

a.csv

  23,88,564
  21,56,461

b.csv

  23,88,1145
  21,56,5763

收集应该像

      {
"_id" : ObjectId("589b264efbb76e87b3611f3d"),
"longitude" : 23,
"latitude" : 88,
"band_4" : 564,
"band_8" : 1145
       }
      {
"_id" : ObjectId("589b264efbb76e87b3611f3d"),
"longitude" : 21,
"latitude" : 56,
"band_4" : 461,
"band_8" : 5763
       }

数据应该在mongoDB集合中逐行导入...任何人都可以帮助我解决这个问题

2 个答案:

答案 0 :(得分:0)

$ awk '
NR==FNR {
    a[$1 OFS $2]=$3; 
    next
}
(($1 OFS $2) in a) {
    print "{";print "\"longitude\" : " $1; 
    print "\"latitude\" : " $2;
    print "\"band_4\" : " $3;
    print "\"band_8\" : " a[$1 OFS $2];print "}"
}' a b

输出:

{
"longitude" : 23
"latitude" : 88
"band_4" : 1145
"band_8" : 564
}
{
"longitude" : 21
"latitude" : 56
"band_4" : 5763
"band_8" : 461
}

答案 1 :(得分:0)

awk 'BEGIN{ FPAT="(^[^,]*,[^,]*)|([^,]*$)"; split( "longitude latitude band_4 band_8", N)}

     FNR==NR{ F[$1]=$0; next}

     ($1 in F) { 
         split( F[$1]","$2, D, /,/)
         print "       {\n\"_id\" : ObjectId(\"589b264efbb76e87b3611f3d\"),"
         for (i=1; i<=4;i++) printf( "\"%s\" : %d,\n", N[i], D[i])
         print "       }"
         }
    ' file1 file2
  • 需要awk 4 for FPAT(awk --version)taht将字段内容定义为FS的反面,指定&#34;什么不在字段中#34;