Python中的集合项

时间:2017-02-21 23:13:41

标签: python

我的mongoDB数据库中有一个如下所示的项目集合:

{u'Keywords': [[u'european', 7], [u'bill', 5], [u'uk', 5], [u'years', 4], [u'brexit', 4]], u'Link': u'http://www.bbc.com/
news/uk-politics-39042876', u'date': datetime.datetime(2017, 2, 21, 22, 47, 7, 463000), u'_id': ObjectId('58acc36b3040a218bc62c6d3')}
.....

这些来自mongo数据库查询

   mydb = client['BBCArticles']
    ##mydb.adminCommand({'setParameter': True, 'textSearchEnabled': True})
    my_collection = mydb['Articles']
    print 'Articles containing  higher occurences of the keyword is sorted as follow:'
    for doc in my_collection.find({"Keywords":{"$elemMatch" : {"$elemMatch": {"$in": [keyword.lower()]}}}}):
        print doc

但是,我想打印文件如下:

doc1
Kewords: european,bill, uk
Link:"http://www.bbc.com/"

doc2
....

1 个答案:

答案 0 :(得分:0)

由于您的集合看起来像list字典,因此它应该是可迭代的并且可以使用for循环进行解析。如果你确实只想要一部分网址和关键字,那么这应该有效:

# c = your_collection, a list of dictionaries

from urlparse import urlparse

for n in range(len(c)):
    print 'doc{n}'.format(n=n+1)
    for k, v in c[n].iteritems():
        if k == 'Keywords':
            print k+':', ', '.join([str(kw[0]) for kw in v[0:3]])
        if k == 'Link':
            parsed_uri = urlparse( v )
            domain = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)
            print k+':', '"{0}"\n'.format(domain)

打印:

doc1
Keywords: european, bill, uk
Link: "http://www.bbc.com/"