我是python的初学者,想合并两个数据框并比较它们之间的日期。一个来自mongodb,另一个来自读取Excel文件,如下所示:
mongodb:
{ "_id" : ObjectId("5bbc86e5c16a27f1e1bd39f8"), "name" : "swetha", "nameId" : 123, "source" : "Blore", "sourceId" : 10, "LastUpdate" : "10-Oct-2018" }
{ "_id" : ObjectId("5bbc86e5c16a27f1e1bd39f9"), "name" : "swetha", "nameId" : 123, "source" : "Mlore", "sourceId" : 11, "LastUpdate" : "11-Oct-2018" }
{ "_id" : ObjectId("5bbc86e5c16a27f1e1bd39fa"), "name" : "swathi", "nameId" : 124, "source" : "Mlore", "sourceId" : 11, "LastUpdate" : "9-Oct-2018" }
我从上面的mongodb中检索了重复的记录,即名称为“ swetha”的记录,现在想将这些重复的记录与下面已阅读的excel文件合并:
[{'source': ['Blore', 'Mlore'], 'P.weight': [100, 200], 'N.weight': [-100, -200], 'Tolerance(days)': [0, 30], 'Durability(Days)': [0, 365]}]
我有以下代码要合并,但其中包含重复的记录。
代码如下:
import json
import pandas as pd
import xlrd
from pymongo import MongoClient
from functools import reduce
try:
client = MongoClient()
print("Connected successfully!!!")
except:
print("Could not connect to MongoDB")
# database
db = client.conn
collection = db.contactReg
df = collection.aggregate([
{
"$group": {
"_id": "$name",
"count": {
"$sum": 1
},
"data": {
"$push": {
"nameId": "$nameId",
"source": "$source",
"sourceId": "$sourceId",
"LastUpdate": "$LastUpdate"
}
}
}
},
{
"$match": {
"count": {
"$gt": 1
}
}
}
])
res = list(df)
print(res)
# reading the excel sheet
frames = pd.read_excel(r"C:\Users\swetha1\Desktop\rules.xlsx",
sheet_name=None)
dicts = [df1.to_dict('list') for df1 in frames.values()]
print(dicts)
merge = list(res + dicts)
print(merge)
输出:
[{'_id': 'swetha', 'count': 2, 'data': [{'nameId': 123.0, 'source': 'Blore', 'sourceId': 10.0, 'LastUpdate': '10-Oct-2018'}, {'nameId': 123.0, 'source': 'Mlore', 'sourceId': '11', 'LastUpdate': '11-Oct-2018'}]}]
[{'source': ['Blore', 'Mlore'], 'P.weight': [100, 200], 'N.weight': [-100, -200], 'Tolerance(days)': [0, 30], 'Durability(Days)': [0, 365]}]
[{'_id': 'swetha', 'count': 2, 'data': [{'nameId': 123.0, 'source': 'Blore', 'sourceId': 10.0, 'LastUpdate': '10-Oct-2018'}, {'nameId': 123.0, 'source': 'Mlore', 'sourceId': '11', 'LastUpdate': '11-Oct-2018'}]}, {'source': ['Blore', 'Mlore'], 'P.weight': [100, 200], 'N.weight': [-100, -200], 'Tolerance(days)': [0, 30], 'Durability(Days)': [0, 365]}]
在输出“源”中是重复的。
在添加以下代码以删除重复项时,它显示错误:
merge = list(set(res + dicts)
错误:
merge = list(set(res + dicts))
TypeError: unhashable type: 'dict'
所以我要做的是: