Question

我的代码使用Pandas内置的数据收集算法收集大量ETFS的数据，并在一天结束时使用大约5年的数据。我的pandas df索引是一个日期时间索引。

当我创建历史数据库时，一切运作良好。现在，我想用cron作业每天更新我的数据库，为此，我创建了一个时间“缓冲区”。

所以我的想法是，当我尝试插入已经在数据库中的数据时，mongo将跳过它（毕竟，我创建了日期作为索引）。因此，有了这个想法，如果一个或几个股票在过去几天没有数据，通过尝试插入冗余数据，我可以有一个完整的数据库。

但是，我收到以下错误：

DuplicateKeyError：E11000重复键错误集合：ETFs.EEM index： id dup key：{：new Date（1524700800000）}

我的理解是mongo应该跳过这个。这段代码出了什么问题？

def insert_to_mongo():
    for x in etfs.keys():
        print 'Processing ' + x
        data_points = len(etfs[x])
        for i in range(data_points):
            data = ({
                '_id': etfs[x].index[i],
                'open': float(etfs[x].iloc[i]['open']),
                'high': float(etfs[x].iloc[i]['high']),
                'low': float(etfs[x].iloc[i]['low']),
                'close': float(etfs[x].iloc[i]['close']),
                'adj close': float(etfs[x].iloc[i]['adj close']),
                'volume': float(etfs[x].iloc[i]['volume']),
                })
            collection = db[x]
            collection.insert_one(data)
    print 'Inserted data successfully'

尝试将新数据插入到mongo db中，这可能会故意包含重复项

0 个答案: