我有2个csv文件(每个文件10gb)
a.csv
23,88,564
21,56,461
b.csv
23,88,1145
21,56,5763
收集应该像
{
"_id" : ObjectId("589b264efbb76e87b3611f3d"),
"longitude" : 23,
"latitude" : 88,
"band_4" : 564,
"band_8" : 1145
}
{
"_id" : ObjectId("589b264efbb76e87b3611f3d"),
"longitude" : 21,
"latitude" : 56,
"band_4" : 461,
"band_8" : 5763
}
数据应该在mongoDB集合中逐行导入...任何人都可以帮助我解决这个问题
答案 0 :(得分:0)
$ awk '
NR==FNR {
a[$1 OFS $2]=$3;
next
}
(($1 OFS $2) in a) {
print "{";print "\"longitude\" : " $1;
print "\"latitude\" : " $2;
print "\"band_4\" : " $3;
print "\"band_8\" : " a[$1 OFS $2];print "}"
}' a b
输出:
{
"longitude" : 23
"latitude" : 88
"band_4" : 1145
"band_8" : 564
}
{
"longitude" : 21
"latitude" : 56
"band_4" : 5763
"band_8" : 461
}
答案 1 :(得分:0)
awk 'BEGIN{ FPAT="(^[^,]*,[^,]*)|([^,]*$)"; split( "longitude latitude band_4 band_8", N)}
FNR==NR{ F[$1]=$0; next}
($1 in F) {
split( F[$1]","$2, D, /,/)
print " {\n\"_id\" : ObjectId(\"589b264efbb76e87b3611f3d\"),"
for (i=1; i<=4;i++) printf( "\"%s\" : %d,\n", N[i], D[i])
print " }"
}
' file1 file2
awk --version
)taht将字段内容定义为FS的反面,指定&#34;什么不在字段中#34;