Python-如何从MongoDB集合中``流式处理''数据?

时间:2018-11-13 17:37:44

标签: python mongodb

我正在运行一个脚本,该脚本将一些数据推送到MongoDB数据库。现在,我试图让另一个Python脚本在每次添加DB时在数据库上打印新条目。

例如:

  

如果将数字80添加到数据库,则脚本应从中获取80   集合并将其添加到控制台后,立即将其打印到我的控制台   数据库。

我的实际工作正常。唯一的问题是,如果我删除了time.sleep(),它将开始快速打印每个条目。

同样,现在,它不是打印新条目,而是打印整个集合+新条目,而不是仅打印新条目(我尝试这样做是因为将来我希望我的脚本能够提取数据并稍后将其提供给Python数组。

  1. 我不能使用change_stream,因为我的数据库不是副本集,对此我还很陌生,所以我对副本集了解不多。
  2. 可以使用可结尾的游标,但使用上限数据库不是最佳选择,因为我将每5秒推送一次数据,并设置一个“限制”(不是上限的意思吗?)做到最好。

有什么建议吗?

from pymongo import MongoClient
import time
import random
from pprint import pprint

client = MongoClient(port=27017)

arr = []

db = client.one

mycol = client["coll"]



while True:
    cursor = db.mycol.find()
    for document in cursor:
        print(document['num'])
    time.sleep(2)    

2 个答案:

答案 0 :(得分:0)

您可以节省文档的创建时间,并重复查询自上次查询以来创建的文档:

import datetime
import time
...

last_query_time = 0
while True:
    now = datetime.datetime.now()
    cursor = db.mycol.find({'created': {'$gt': last_query_time}})
    last_query_time = now
    for document in cursor:
        print(document['num'])
    time.sleep(2)

答案 1 :(得分:0)

您可以使用以下代码连接mongo客户端并绘制实时图形并传输数据

首先获取一个名为mongo_db_connectivity.py的新文件,然后应用以下代码

from pymongo import MongoClient

from bson.objectid import ObjectId

import pandas as pd

from darksky import forecast

import os

import uuid

import warnings

warnings.simplefilter(action='ignore', category=FutureWarning)

pd.options.mode.chained_assignment = None

import collections

import mysql.connector

import json

import numpy as np

import pytz

from pymongo.errors import InvalidName

from datetime import datetime

from dotenv import load_dotenv
import urllib.parse


class Database:

def __init__(self):
    load_dotenv((".env"))
    # load_dotenv(".env")
    print("Loading MongoDB")
    self.db_host = os.getenv('DB_HOST')
    self.db_port = int(os.getenv('DB_PORT'))
    self.db_name = os.getenv('DB_NAME')
    self.db_username = os.getenv('DB_USERNAME')
    self.db_password = os.getenv('DB_PASSWORD')

    # print("loading SQL Connectivity")

    # print("SQL Connection succesfully")

    # db_collection=os.getenv('DB_COLLECTION')
    self.database = self.connection(self.db_host, self.db_port, self.db_name, self.db_username, self.db_password)
    print("mongoDB connected succesfully")

def update_mongo(self, db_collection, data):
    try:
        # Create DataFrame
        df = pd.DataFrame(data, index=[0])
        data1 = df.to_dict(orient='records')

        print("proceeding to mongo Update")

        # print("proceeding to mongo Update")

        self.database[db_collection].insert_many(data1)

        print("mongo update completed")

        # print("mongo update completed")

        ##print(db_host, db_port, db_name, db_username, db_password)
        return True
    except Exception as error:
        print(error)
        return False

def connection(self, host, port, database_name, username, password):
    try:
        if username and password:

            # mongodb_uri= "mongodb://" + urllib.parse.quote_plus("@jete2$") + "@35.154.95.79:27017"

            mongodb_uri = 'mongodb://%s:%s@%s:%s' % (username, urllib.parse.quote_plus(password), host, port)

            client = MongoClient(mongodb_uri)
        else:
            client = MongoClient(host, port)

        # Validating if the database exists

        database_names = client.list_database_names()

        if database_name in database_names:
            return client[database_name]
        else:
            raise InvalidName('Database does not exist')
    except Exception as error:
        print('error', error)

def find(self, collection_name, condition):
    try:
        collection_names = self.database.list_collection_names()
        # print(collection_names)
        if collection_name in collection_names:
            collection = self.database[collection_name].find(condition)
            # print(collection)
            return collection
        else:
            raise InvalidName('Collection does not exist')
    except Exception as error:
        print("collection name error %s", collection_name)
        return -1

创建另一个名为.env的文件,您可以加载所有环境变量

DB_HOST ="your host"
DB_PORT =27017 
DB_NAME ="database name"
DB_USERNAME ="username"
DB_PASSWORD ="password"

创建最后一个文件名proplot.py,您可以使用以下代码

from mongo_db_connectivity import Database
import pandas as pd
import time
from datetime import datetime
import datetime

import random
from itertools import count
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation

plt.style.use('fivethirtyeight')

x_vals = []
y_vals = []
class Sample:
def __init__(self):
    self.data=Database()
    self.timepresent=str(datetime.datetime.now()) # sending timestamp also

def values(self,i):
    try:

        col=self.data.find(<collection>,{<your_condition})
        
        df=pd.DataFrame(list(col))
        data=df.reset_index()
     
        x = data['index']
        y1 = data['readings'] #polling readings you can change your column name
        

        plt.cla()

        plt.plot(x, y1, label='Label_name') 
        
        plt.legend(loc='upper left')
        plt.tight_layout()
    except Exception as error:
        print(error)

def function(self):
    ani = FuncAnimation(plt.gcf(), self.values, interval=1000)

    plt.tight_layout()
    plt.show()


def graph_final(self):
    try:
        self.function()
    except Exception as error:
        print("no graph data found, retrying")
        self.graph_final()

Sample().graph_final()