Question

我确定我的代码中有错误，因为我是pyMongo的新手，但我会尝试一下。 MongoDB中的数据为167k +，如下所示：

{'overall': 5.0,
 'reviewText': {'ago': 1,
                'buy': 2,
                'daughter': 1,
                'holiday': 1,
                'love': 2,
                'niece': 1,
                'one': 2,
                'still': 1,
                'today': 1,
                'use': 1,
                'year': 1},
 'reviewerName': 'dcrm'}

我想对在reviewText字段中使用的所有5.0评分进行统计。我已运行以下代码，但出现以下错误。有见识吗？

#1 Find the top 20 most common words found in 1-star reviews.

aggr = [{"$unwind": "$reviewText"}, 
        {"$group": { "_id": "$reviewText", "word_freq": {"$sum":1}}}, 
        {"$sort": {"word_freq": -1}},
        {"$limit": 20},
        {"$project": {"overall":"$overall", "word_freq":1}}]
disk_use = { 'allowDiskUse': True }
findings = list(collection.aggregate(aggr, disk_use))

for item in findings:
    p(item)

如您所见，由于遇到了超过100MB的阈值，因此遇到了“ allDiskUse”组件。但是我得到的错误是：

AttributeError: 'dict' object has no attribute '_txn_read_preference'

Answer 1

你很接近，allowDiskUse是命名参数而不是字典所以语句应该是这样的

findings = list(collection.aggregate(aggr, allowDiskUse=True))

或

findings = list(collection.aggregate(aggr, **disk_use ))

PyMongo聚合“ AttributeError：'dict'对象没有属性'_txn_read_preference'”

1 个答案: