考虑一个简单的Riak键值数据库,其中包含城市名称和与该城市关联的少量标签。我正在使用Python 3客户端创建存储桶并添加数据:
import riak
client = riak.RiakClient(pb_port=8087, protocol='pbc')
bucket = client.bucket('cities')
# Adding data to the bucket
bucket.new('tallinn', {'name': 'Tallinn', 'tags': ['architecture', 'food', 'port', 'forest']}).store()
bucket.new('riga', {'name': 'Riga', 'tags': ['food', 'architecture', 'forest']}).store()
bucket.new('vilnius', {'name': 'Vilnius', 'tags': ['beer', 'food', 'shopping']}).store()
bucket.new('kiev', {'name': 'Kiev'}).store()
然后我可以像这样检查存储桶中的内容:
keys = client.get_keys(bucket) # Get all keys from bucket
print('Keys:', keys)
for key in keys:
article = bucket.get(key).data # Get data by key from bucket
print(article)
print(type(article)) # Check what is the type of object I get
输出:
Keys: ['tallinn', 'riga', 'kiev', 'vilnius']
{'name': 'Tallinn', 'tags': ['architecture', 'food', 'port', 'forest']}
{'name': 'Riga', 'tags': ['food', 'architecture', 'forest']}
{'name': 'Kiev'}
{'name': 'Vilnius', 'tags': ['beer', 'food', 'shopping']}
<class 'dict'>
如您所见,我得到了我的付出。而且由于对象的类型仍然是字典<class 'dict'>
,因此我可以轻松访问数据的任何部分。
从这些数据中,我想使用MapReduce获得出现在数据中的每个标签的受欢迎程度。就像元组或列表的排序列表一样:
[(3, 'food'), (2, 'forest'), (2, 'architecture'), (1, 'shopping'), (1, 'port'), (1, 'beer')]
使用以下代码,我可以从每个键值对中获取标签列表:
query = client.add('cities')
# Javascript functions for Map phase and Reduce phace
js_func_map = "function(v) {var val = JSON.parse(v.values[0].data);"\
"return[val.tags];}"
js_func_reduce = "function(values) {return values;}"
query.map(js_func_map) # Add Javascript function to Map phase
query.reduce(js_func_reduce) # Add Javascript function to Reduce phase
# Get result form query
for result in query.run():
print(result)
但是,它仍然与我的意图相去甚远:
['bear', 'architecture', 'forest']
None
['architecture', 'food', 'port', 'forest']
['bear', 'food', 'shopping']