如何在CouchDB中从巨大的JSON文件(460 MB)批量插入

时间:2012-06-11 11:48:10

标签: json couchdb bulkinsert

我需要在CouchDB数据库中批量插入文档。 我正在尝试按照此处的说明操作:http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API

这是我的剧本:

~$ DB="http://localhost:5984/employees"
~$ curl -H "Content-Type:application/json" -d @employees_selfContained.json -vX POST $DB/_bulk_docs

文件employees_selfContained.json是一个巨大的文件= 465 MB。我已经使用JSONLint对其进行了验证,发现没有错。

这是curl的详细输出:

* About to connect() to 127.0.0.1 port 5984 (#0)
* Trying 127.0.0.1... connected
* Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
> POST /employees/_bulk_docs HTTP/1.1
> User-Agent: curl/7.19.7 (i486-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15
> Host: 127.0.0.1:5984
> Accept: */*
> Content-Type:application/json
> Content-Length: 439203931
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* Empty reply from server
* Connection #0 to host 127.0.0.1 left intact
curl: (52) Empty reply from server
* Closing connection #0

如何从那个巨大的单个文件中进行批量插入?如果可能的话,我不想将文件拆分成更小的尺寸。

编辑:如果有人想知道,我正在尝试转换此架构: http://dev.mysql.com/doc/employee/en/sakila-structure.html 进入自包含的文档数据库,结构如下:

{
    "docs": [
        {
            "emp_no": ..,
            "birth_date": ..,
            "first_name": ..,
            "last_name" : ..,
            "gender": ..,
            "hire_date": .., 
            "titles": 
                [
                    {
                    "title": ..,
                    "from_date": .., 
                    "to_date": ..
                    },
                    {..}
                ], 
            "salaries" : 
                [
                    {
                    "salary": ..,
                    "from_date": ..,
                    "to_date": ..
                    },
                    {..}                
                ], 
            "dept_emp": 
                [ 
                    {
                    "dept_no": ..,
                    "from_date": ..,
                    "to_date":
                    },
                    {..}
                ], 
            "dept_manager": 
                [ 
                    {
                    "dept_no": ..,
                    "from_date": ..,
                    "to_date": ..
                    },
                    {..}
                ], 
            "departments":
                [
                    {
                    "dept_no": .., 
                    "dept_name": ..
                    },
                    {..}
                ]
        } ,
        .
        .
        {..}
    ]
} 

1 个答案:

答案 0 :(得分:1)

循环JSON并分批插入10-50k文档。