Ciao,这是我的复制示例。
a=c(1,2,3,4,5,6)
a1=c(15,17,17,16,14,15)
a2=c(0,0,1,1,1,0)
b=c(1,0,NA,NA,0,NA)
c=c(2010,2010,2010,2010,2010,2010)
d=c(1,1,0,1,0,NA)
e=c(2012,2012,2012,2012,2012,2012)
f=c(1,0,0,0,0,NA)
g=c(2014,2014,2014,2014,2014,2014)
h=c(1,1,0,1,0,NA)
i=c(2010,2012,2014,2012,2014,2014)
mydata = data.frame(a,a1,a2,b,c,d,e,f,g,h,i)
names(mydata) = c("id","age","gender","drop1","year1","drop2","year2","drop3","year3","drop4","year4")
mydata2 <- reshape(mydata, direction = "long", varying = list(c("year1","year2","year3","year4"), c("drop1","drop2","drop3","drop4")),v.names = c("year", "drop"), idvar = "X", timevar = "Year", times = c(1:4))
x1 = mydata2 %>%
group_by(id) %>%
slice(which(drop==1)[1])
x2 = mydata2 %>%
group_by(id) %>%
slice(which(drop==0)[1])
我有一个很高的数据“ mydata2”,这样每个ID都有很多行。
我想创建新的数据集“ x”,以使每个ID都有一行基于它们是否丢失。 drop1 drop2 drop3 drop4的第一个等于1,我想取那一年,并将其放入变量dropYEAR中。如果drop1 drop2 drop3 drop4中的任何一个都不等于1,我想将year1 year2 year3 year4中的最后一个数据点放入变量dropYEAR。
每个ID最终应该有1行,我想创建2个新列:如果ID曾经删除,didDROP等于1,或者如果ID从未删除则didDROP等于0。如果didDROP等于1或等于上次报告的year1 year2 year3 year4如果ID从未下降,则dropYEAR等于下降的年份。我尝试在dplyr中执行此操作,但这仅给出了我想要的一部分,因为它摆脱了等于0的ID值。
答案 0 :(得分:1)
首先mydata2 %>% arrange(id)
了解数据集,然后使用dplyr
first
和last
,我们可以提取drop==1
的第一年和万一的最后一年的丢弃永远不会得到1,其中drop
不为null。使用case_when
检查didDROP
,因为它在处理NA方面具有很好的魔力。
library(dplyr)
mydata2 %>% group_by(id) %>%
mutate(dropY=first(year[!is.na(drop) & drop==1]),
dropYEAR=if_else(is.na(dropY), last(year[!is.na(drop)]),dropY)) %>%
slice(1)
#Update
mydata2 %>% group_by(id) %>%
mutate(dropY=first(year[!is.na(drop) & drop==1]),
dropYEAR=if_else(is.na(dropY), last(year),dropY),
didDROP=case_when(any(drop==1) ~ 1, #Return 1 if there is any drop=1 o.w it will return 0
TRUE ~ 0)) %>%
select(-dropY) %>% slice(1)
# A tibble: 6 x 9
# Groups: id [6]
id age gender Year year drop X dropYEAR didDROP
<dbl> <dbl> <dbl> <int> <dbl> <dbl> <int> <dbl> <dbl>
1 1 15 0 1 2010 1 1 2010 1
2 2 17 0 1 2010 0 2 2012 1
3 3 17 1 1 2010 NA 3 2014 0
4 4 16 1 1 2010 NA 4 2012 1
5 5 14 1 1 2010 0 5 2014 0
6 6 15 0 1 2010 NA 6 2014 0
我希望这是您想要的。
答案 1 :(得分:1)
您可以按ID,丢弃次数和年份进行排序,有条件的话可以选择是否丢弃:
car_dict = {'mercedes': 200, 'fiat': 100, 'porsche': 300, 'rocketcar': 600}
min_car = {}
for car in car_dict:
if not min_car or car_dict[car] < min_car['speed']:
min_car['name'] = car
min_car['speed'] = car_dict[car]
print(min_car)