如何使用mongoimport指定前导字段名称?

时间:2018-01-15 16:59:06

标签: database mongodb csv import

我将一个非常大的csv文件导入mongodb,其格式如下:

"zzzàms@hotmail.com","12071988"
"zzzг ms@hotmail.com","12071988"
"zzпїѕпїѕmmbbii2@bk.ru","MA15042002"
"zzпїѕпїѕmmbbii2@list.ru","MA15042002"
"zzпїѕпїѕmmbbii2@rambler.ru","MA15042002"
"zzпїѕпїѕmmbbii2@yandex.ru","MA15042002"

但是,我不确定在电子邮件字段之后会有多少字段/列。

我使用此命令导入:

mongoimport -d emails -c second --file all.csv --type csv --fields email, number

但是,在数字字段后面的任何字段/列都会发出默认值' field2',' field3'等等。

{ "_id" : ObjectId("5a5cd95e598f1e910d353e3b"), "email" : "00-amber-00@embarqmail.com", " number" : "number1", "field2" : "number2" }

如何在同一列中的数字字段后面添加任何内容,以便将其分类为' number&#39 ;?

有时,一个条目可能有40列。

除非确实有必要,否则我不想修改csv文件。

抱歉,英语不是第一语言,谢谢。

1 个答案:

答案 0 :(得分:0)

您可以使用Unix之类的awk命令根据逻辑将行标准解析为json,并stdinmongoimport

示例文件

saravana@ubuntu:~$ cat sample-doc.txt 
"zzzàms@hotmail.com","12071988"
"zzzг ms@hotmail.com","12071988"
"zzпїѕпїѕmmbbii2@bk.ru","MA15042002"
"zzпїѕпїѕmmbbii2@list.ru","MA15042002"
"zzпїѕпїѕmmbbii2@rambler.ru","MA15042002","34534"
"zzпїѕпїѕmmbbii2@yandex.ru","MA15042002","1232434","3435435","53534"

awk转换json,电子邮件后跟数字

saravana@ubuntu:~$ cat sample-doc.txt | awk 'BEGIN{FS=","}{print "{ email :" $1 ", numbers : [ " substr($0,length($1)+2) " ] } " }'
{ email :"zzzàms@hotmail.com", numbers : [ "12071988" ] } 
{ email :"zzzг ms@hotmail.com", numbers : [ "12071988" ] } 
{ email :"zzпїѕпїѕmmbbii2@bk.ru", numbers : [ "MA15042002" ] } 
{ email :"zzпїѕпїѕmmbbii2@list.ru", numbers : [ "MA15042002" ] } 
{ email :"zzпїѕпїѕmmbbii2@rambler.ru", numbers : [ "MA15042002","34534" ] } 
{ email :"zzпїѕпїѕmmbbii2@yandex.ru", numbers : [ "MA15042002","1232434","3435435","53534" ] } 
saravana@ubuntu:~$ 

mongoimport使用stdin

saravana@ubuntu:~$ cat sample-doc.txt | awk 'BEGIN{FS=","}{print "{ email :" $1 ", numbers : [ " substr($0,length($1)+2) " ] } " }' | mongoimport --type json --db test --collection emailnos -v
2018-01-17T09:58:11.559+0530    reading from stdin
2018-01-17T09:58:11.559+0530    using fields: 
2018-01-17T09:58:11.561+0530    connected to: localhost
2018-01-17T09:58:11.561+0530    ns: test.emailnos
2018-01-17T09:58:11.561+0530    connected to node type: standalone
2018-01-17T09:58:11.561+0530    using write concern: w='1', j=false, fsync=false, wtimeout=0
2018-01-17T09:58:11.561+0530    using write concern: w='1', j=false, fsync=false, wtimeout=0
2018-01-17T09:58:11.726+0530    imported 6 documents

集合

> db.emailnos.find()
{ "_id" : ObjectId("5a5ed0dbead4f5f7ae68da90"), "email" : "zzzàms@hotmail.com", "numbers" : [ "12071988" ] }
{ "_id" : ObjectId("5a5ed0dbead4f5f7ae68da91"), "email" : "zzпїѕпїѕmmbbii2@list.ru", "numbers" : [ "MA15042002" ] }
{ "_id" : ObjectId("5a5ed0dbead4f5f7ae68da92"), "email" : "zzпїѕпїѕmmbbii2@rambler.ru", "numbers" : [ "MA15042002", "34534" ] }
{ "_id" : ObjectId("5a5ed0dbead4f5f7ae68da93"), "email" : "zzпїѕпїѕmmbbii2@yandex.ru", "numbers" : [ "MA15042002", "1232434", "3435435", "53534" ] }
{ "_id" : ObjectId("5a5ed0dbead4f5f7ae68da94"), "email" : "zzzг ms@hotmail.com", "numbers" : [ "12071988" ] }
{ "_id" : ObjectId("5a5ed0dbead4f5f7ae68da95"), "email" : "zzпїѕпїѕmmbbii2@bk.ru", "numbers" : [ "MA15042002" ] }
>