我将一个非常大的csv文件导入mongodb,其格式如下:
"zzzàms@hotmail.com","12071988"
"zzzг ms@hotmail.com","12071988"
"zzпїѕпїѕmmbbii2@bk.ru","MA15042002"
"zzпїѕпїѕmmbbii2@list.ru","MA15042002"
"zzпїѕпїѕmmbbii2@rambler.ru","MA15042002"
"zzпїѕпїѕmmbbii2@yandex.ru","MA15042002"
但是,我不确定在电子邮件字段之后会有多少字段/列。
我使用此命令导入:
mongoimport -d emails -c second --file all.csv --type csv --fields email, number
但是,在数字字段后面的任何字段/列都会发出默认值' field2',' field3'等等。
{ "_id" : ObjectId("5a5cd95e598f1e910d353e3b"), "email" : "00-amber-00@embarqmail.com", " number" : "number1", "field2" : "number2" }
如何在同一列中的数字字段后面添加任何内容,以便将其分类为' number&#39 ;?
有时,一个条目可能有40列。
除非确实有必要,否则我不想修改csv文件。
抱歉,英语不是第一语言,谢谢。
答案 0 :(得分:0)
您可以使用Unix
之类的awk
命令根据逻辑将行标准解析为json
,并stdin
至mongoimport
示例文件
saravana@ubuntu:~$ cat sample-doc.txt
"zzzàms@hotmail.com","12071988"
"zzzг ms@hotmail.com","12071988"
"zzпїѕпїѕmmbbii2@bk.ru","MA15042002"
"zzпїѕпїѕmmbbii2@list.ru","MA15042002"
"zzпїѕпїѕmmbbii2@rambler.ru","MA15042002","34534"
"zzпїѕпїѕmmbbii2@yandex.ru","MA15042002","1232434","3435435","53534"
awk
转换json
,电子邮件后跟数字
saravana@ubuntu:~$ cat sample-doc.txt | awk 'BEGIN{FS=","}{print "{ email :" $1 ", numbers : [ " substr($0,length($1)+2) " ] } " }'
{ email :"zzzàms@hotmail.com", numbers : [ "12071988" ] }
{ email :"zzzг ms@hotmail.com", numbers : [ "12071988" ] }
{ email :"zzпїѕпїѕmmbbii2@bk.ru", numbers : [ "MA15042002" ] }
{ email :"zzпїѕпїѕmmbbii2@list.ru", numbers : [ "MA15042002" ] }
{ email :"zzпїѕпїѕmmbbii2@rambler.ru", numbers : [ "MA15042002","34534" ] }
{ email :"zzпїѕпїѕmmbbii2@yandex.ru", numbers : [ "MA15042002","1232434","3435435","53534" ] }
saravana@ubuntu:~$
mongoimport
使用stdin
saravana@ubuntu:~$ cat sample-doc.txt | awk 'BEGIN{FS=","}{print "{ email :" $1 ", numbers : [ " substr($0,length($1)+2) " ] } " }' | mongoimport --type json --db test --collection emailnos -v
2018-01-17T09:58:11.559+0530 reading from stdin
2018-01-17T09:58:11.559+0530 using fields:
2018-01-17T09:58:11.561+0530 connected to: localhost
2018-01-17T09:58:11.561+0530 ns: test.emailnos
2018-01-17T09:58:11.561+0530 connected to node type: standalone
2018-01-17T09:58:11.561+0530 using write concern: w='1', j=false, fsync=false, wtimeout=0
2018-01-17T09:58:11.561+0530 using write concern: w='1', j=false, fsync=false, wtimeout=0
2018-01-17T09:58:11.726+0530 imported 6 documents
集合
> db.emailnos.find()
{ "_id" : ObjectId("5a5ed0dbead4f5f7ae68da90"), "email" : "zzzàms@hotmail.com", "numbers" : [ "12071988" ] }
{ "_id" : ObjectId("5a5ed0dbead4f5f7ae68da91"), "email" : "zzпїѕпїѕmmbbii2@list.ru", "numbers" : [ "MA15042002" ] }
{ "_id" : ObjectId("5a5ed0dbead4f5f7ae68da92"), "email" : "zzпїѕпїѕmmbbii2@rambler.ru", "numbers" : [ "MA15042002", "34534" ] }
{ "_id" : ObjectId("5a5ed0dbead4f5f7ae68da93"), "email" : "zzпїѕпїѕmmbbii2@yandex.ru", "numbers" : [ "MA15042002", "1232434", "3435435", "53534" ] }
{ "_id" : ObjectId("5a5ed0dbead4f5f7ae68da94"), "email" : "zzzг ms@hotmail.com", "numbers" : [ "12071988" ] }
{ "_id" : ObjectId("5a5ed0dbead4f5f7ae68da95"), "email" : "zzпїѕпїѕmmbbii2@bk.ru", "numbers" : [ "MA15042002" ] }
>