我的数据如下:
20130101 12.8 9.6
20130102 10.1 3.8
20130103 7.0 -2.2
20130104 11.8 -3.7
20130105 8.6 -1.1
20130106 10.5 1.9
20130107 13.4 -0.1
20130108 16.2 1.4
20130109 17.8 12.4
20130110 20.0 16.2
20130111 15.4 5.0
我想确定最高温度大于40(炎热的一天)和最低温度低于10(寒冷的一天)的日期。 为此,我运行以下代码:
current_date = None
current_temp = None
for line in data.strip(). split('\n'):
Mapper_data = ["%s\o%s\o%s" % (line.split(' ')[0], line.split(' ')[1],line.split(' ')[2]) ]
for line in Mapper_data:
line = line.strip()
date, max_temp,min_temp = line.rsplit('\o', 2)
try:
max_temp = float(max_temp)
min_temp = float(min_temp)
except ValueError:
continue
if current_date == date:
if max_temp > 40:
current_temp = 'Hot day'
if min_temp< 10:
current_temp = 'Cold day'
else:
if current_date:
print ('%s\t%s' % (current_date, current_temp))
if max_temp > 40:
current_temp = 'Hot day'
if min_temp< 10:
current_temp = 'Cold day'
current_date = date
if current_date == date:
print ('%s\t%s' % (current_date, current_temp))
我得到以下结果:
20130101 Cold day
20130102 Cold day
20130103 Cold day
20130104 Cold day
20130105 Cold day
20130106 Cold day
20130107 Cold day
20130108 Cold day
20130109 Cold day
20130110 Cold day
20130111 Cold day
但我需要的结果是:
20130101 Cold day
20130102 Cold day
20130103 Cold day
20130104 Cold day
20130105 Cold day
20130106 Cold day
20130107 Cold day
20130108 Cold day
20130111 Cold day
因为20130109和20130110既不冷也不热。
如果您有任何想法我如何更改我的代码以获得最后的结果请帮助。
答案 0 :(得分:0)
如果你想要一个兼容Hadoop的Python脚本,它需要从STDIN中读取
set.seed(123)
df <- data.frame(name = sample(letters, 100, TRUE),
date = sample(1:500, 100, TRUE))
library(dplyr)
filter(df, date < 50) # date less than 50
filter(df, date %in% 50:100) # date between 50 and 100
filter(df, date %in% 1:50 & name == "r") # date between 1 and 50 AND name is "r"
filter(df, date %in% 1:50 | name == "r") # date between 1 and 50 OR name is "r"
# You can also use the pipe (%>%) operator
df %>% filter(date %in% 1:50 | name == "r")
以下是本地运行
的示例import sys
for line in sys.stdin:
current_date, max_temp, min_temp = line.split()
condition = None
try:
f_min_temp = float(min_temp)
f_max_temp = float(max_temp)
except ValueError:
continue
if f_max_temp > 40:
condition = 'Hot day'
if f_min_temp < 10:
condition = 'Cold day'
if condition:
print ('%s\t%s' % (current_date, condition))
要在Hadoop中运行,请参阅Hadoop Streaming