Question

我有一个名为“hello”的mongo集合中的数据。文件看起来像：

{ 
name: ..., 
size: ..., 
timestamp: ISODate("2013-01-09T21:04:12Z"), 
data: { text:..., place:...},
other: ...
}

我想将每个文档的时间戳和文本导出到CSV文件中，第一列是时间戳，第二列是文本。

我尝试创建一个新集合（hello2），其中文档只有时间戳和文本。

data = db.hello
for i in data:
    try:
        connection.me.hello2.insert(i["data"]["text"], i["timestamp"])
    except:
        print "Unable", sys.exc_info()

然后我想使用mongoexport：

mongoexport --db me --collection hello2 --csv --out /Dropbox/me/hello2.csv

但这不起作用，我不知道如何继续。

PS：我还想只将ISODate的时间存储在CSV文件中，即只是21:04:12而不是ISODate（“2013-01-09T21：04：12Z”）

感谢您的帮助。

Answer 1

您可以直接从数据收集中导出，无需临时收集：

for r in db.hello.find(fields=['text', 'timestamp']):
     print '"%s","%s"' % (r['text'], r['timestamp'].strftime('%H:%M:%S'))

或写入文件：

with open(output, 'w') as fp:
   for r in db.hello.find(fields=['text', 'timestamp']):
       print >>fp, '"%s","%s"' % (r['text'], r['timestamp'].strftime('%H:%M:%S'))

要过滤掉重复项并仅打印最近的重复项，应分两步拆分该过程。首先，在字典中累积数据：

recs = {}
for r in d.foo.find(fields=['data', 'timestamp']):
    text, time = r['data']['text'], r['timestamp']
    if text not in recs or recs[text] < time:
        recs[text] = time

然后输出字典内容：

for text, time in recs.items():
    print '"%s","%s"' % (text, time.strftime('%H:%M:%S'))

将Mongo文档的条目保存为CSV +格式化ISODate

1 个答案: