映射器代码: -
#!/usr/bin/python
import sys
for line in sys.stdin:
data = line.strip().split("\t")
if len(data) == 6:
date, time, place, temp, pressure, humidity = data
print "{0}\t{1}".format(place, temp)
reducer code
#!/usr/bin/python
import sys
max_val = -sys.maxfloat
oldKey = None
for line in sys.stdin:
data_mapped = line.strip().split("\t")
if len(data_mapped) != 2:
# Something has gone wrong. Skip this line.
continue
thisKey, thisVal = data_mapped
if oldKey and oldKey != thisKey:
print oldKey, "\t", max_val
(oldKey, max_val) = (thisKey, float(thisVal))
else:
(oldKey, max_val) = (thisKey, max(max_val, float(thisVal)))
if oldKey != None:
print oldKey, "\t", max_val
reducer代码未运行我尝试了一切 mapper是正确的ruunung而不是reducer
我的数据集是这样的:
2012-01-01 09:00 San Jose 28 214.05 25
2012-01-01 09:00 Fort Worth 19 153.57 32
2012-01-01 09:00 San Diego 0 66.08 35
2012-01-01 09:00 Pittsburgh 18 493.51 28
2012-01-01 09:00 Omaha 10 235.63 32
2012-01-01 09:00 Stockton 28 247.18 32
2012-01-01 09:00 Austin 44 379.6 32
2012-01-01 09:00 New York 26 296.8 35
2012-01-01 09:00 Corpus Christi 32 25.38 28
2012-01-01 09:00 Fort Worth 32 213.88 32
2012-01-01 09:00 Las Vegas 21 53.26 32
2012-01-01 09:00 Newark 21 39.75 35
2012-01-01 09:00 Austin 44 469.63 32
2012-01-01 09:00 Greensboro 32 290.82 32
2012-01-01 09:00 San Francisco 0 260.65 28
2012-01-01 09:00 Lincoln 33 136.9 32
2012-01-01 09:00 Buffalo 19 483.82 32
2012-01-01 09:00 San Jose 19 215.82 35
2012-01-01 09:00 Boston 44 418.94 25
首先是日期,时间,地点,温度,压力,湿度