我有一个示例数据框“数据”如下:
X Y Month Year income
2281205 228120 3 2011 1000
2281212 228121 9 2010 1100
2281213 228121 12 2010 900
2281214 228121 3 2011 9000
2281222 228122 6 2010 1111
2281223 228122 9 2010 3000
2281224 228122 12 2010 1889
2281225 228122 3 2011 778
2281243 228124 12 2010 1111
2281244 228124 3 2011 200
2281282 228128 9 2010 7889
2281283 228128 12 2010 2900
2281284 228128 3 2011 3400
2281302 228130 9 2010 1200
2281303 228130 12 2010 2000
2281304 228130 3 2011 1900
2281352 228135 9 2010 2300
2281353 228135 12 2010 1333
2281354 228135 3 2011 2340
我使用ddply计算每个Y
的收入x <- ddply(data, .(Y), summarize, freq=length(Y), tot=sum(income))
#Now, I also need to find out the X for each Y depending upon the following conditions:
a. If Y consists of observations of months 9 (2010), 12 (2010), and 3 (2011), then the x corresponds to months 9(2010) i.e. for Y =228121 x=2281212
b. If Y consists of observations of month 6 (2010), 9 (2010), 12(2010) , and 3 (2011) then the x corresponds to months 6 (2010) i.e. for Y =228122 x=2281222.
c. If Y consists of observations of month 12 (2010), 3 (2011) then the x corresponds to months 12 (2010) i.e. for Y =228124 x=2281243.
d. If Y consists of observations of month 12 (2010), 3 (2011) then the x corresponds to months 12 (2010) i.e. for Y =228124 x=2281243.
e. If Y consists of only one observation then the x corresponds to month of that observation i.e. for Y =228120 x=2281205.
这里的要点是,如果我对每个Y有多个观察,我选择x对应于第6个月(2010)(如果可用),但如果不可用,我选择接近6(2010)的月份(例如.9(2010))。请注意,如果我只有一个观察,我将为该观察选择x。
请建议如何在ddply中加入这些条件。
答案 0 :(得分:2)
此解决方案假定@DWin选择了最早的X
值作为sugessted。 Month
和Year
变量转换为日期格式,然后选择最早作为选择X
的条件。
library(zoo) #necessary for date manipulation
x <- ddply(data, .(Y), summarize, freq=length(Y), tot=sum(income),
X=X[as.yearmon(paste(Month,Year,sep="/"),format="%m/%Y")==min(as.yearmon(paste(Month,Year,sep="/"),format="%m/%Y"))])
Y freq tot X
1 228120 1 1000 2281205
2 228121 3 11000 2281212
3 228122 4 6778 2281222
4 228124 2 1311 2281243
5 228128 3 14189 2281282
6 228130 3 5100 2281302
7 228135 3 5973 2281352
library(zoo)
#new column containing dates made from Month and Year
data$Time<-as.Date(as.yearmon(paste(data$Month,data$Year,sep="/"),format="%m/%Y"))
#calculated difference between new date column and 2010-06
data$Time.dif<-abs(as.numeric(data$Time-as.Date("2010-06-01")))
#now selects X when Time.dif is smallest (0 in case of 2010-06)
x <- ddply(data, .(Y), summarize, freq=length(Y), tot=sum(income),
X=X[Time.dif==min(Time.dif)])