循环遍历数据框以在其他列中添加列依赖变量

时间:2013-08-20 13:28:54

标签: r

我对R很新,因为我确信从我的问题中可以明显看出来。

我有一个如下所示的数据框(d):

 dput(d[1:24,])
structure(list(year = c(1967, 1967, 1967, 1967, 1967, 1967, 1967, 
1967, 1968, 1968, 1968, 1968, 1968, 1968, 1968, 1968, 1968, 1968, 
1968, 1968, 1969, 1969, 1969, 1969), month = c(5, 6, 7, 8, 9, 
10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4
), temp = c(16.545, 20.2275, 24.9425, 24.704, 21.5625, 20.3833333333333, 
18.085, 16.325, 13.725, 13.095, 13.07, 15.2525, 16.4933333333333, 
20.64, 23.0375, 22.4766666666667, 21.1975, 20.458, 17.9725, 16.1866666666667, 
13.78, 13.155, 12.822, 14.0666666666667), date = structure(c(-976, 
-945, -915, -884, -853, -823, -792, -762, -731, -700, -671, -640, 
-610, -579, -549, -518, -487, -457, -426, -396, -365, -334, -306, 
-275), class = "Date")), .Names = c("year", "month", "temp", 
"date"), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", 
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", 
"20", "21", "22", "23", "24"), class = "data.frame")

由此我发现并存储了每个月“temp”的平均值:

jan <- 13.80588 feb <- 13.31874 mar <- 13.35263 apr <- 14.31068 may <- 17.00249 jun <- 20.55553 jul <- 23.55765 aug <- 24.55040 sep <- 22.56809 oct <- 20.15921 nov <- 17.70971 dec <- 15.41233

从“temp”列中的每个值中,我想减去相应月份的平均值,并将结果添加到新列中,即:if(d $ month == 1),5]&lt; -c(d $ temp - jan)。 如果对于nrow month == 1,则从同一行中的temp值中减去jan。

我尝试使用for循环执行此操作:

 for (i in 1:nrow(d)){
+   d[which(d$month[i]==1),5]<-c(d$temp[i] - jan)
+   d[which(d$month[i]==2),5]<-c(d$temp[i] - feb)
+   d[which(d$month[i]==3),5]<-c(d$temp[i] - mar)
+   d[which(d$month[i]==4),5]<-c(d$temp[i] - apr)
+   d[which(d$month[i]==5),5]<-c(d$temp[i] - may)
+   d[which(d$month[i]==6),5]<-c(d$temp[i] - jun)
+   d[which(d$month[i]==7),5]<-c(d$temp[i] - jul)
+   d[which(d$month[i]==8),5]<-c(d$temp[i] - aug)
+   d[which(d$month[i]==9),5]<-c(d$temp[i] - sep)
+   d[which(d$month[i]==10),5]<-c(d$temp[i] - oct)
+   d[which(d$month[i]==11),5]<-c(d$temp[i] - nov)
+   d[which(d$month[i]==12),5]<-c(d$temp[i] - dec)
+ }

有50个或更多警告(使用警告()查看前50个

这导致为每个而不是相应的临时条目选择正确的月份,R在每个计算中使用第一行中的临时值。我敢肯定必须有一个更简单的方法!!

提前致谢

1 个答案:

答案 0 :(得分:2)

R为您提供了更简单的方法来完成这些事情。您可以跳过创建所有janfeb变量,然后只使用ddply,这样您就可以将数据框拆分成块,在这种情况下代表月份:

df = structure(list(year = c(1967, 1967, 1967, 1967, 1967, 1967, 1967, 1967, 1968, 1968, 1968, 1968, 1968, 1968, 1968, 1968, 1968, 1968, 1968, 1968, 1969, 1969, 1969, 1969), month = c(5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4 ), temp = c(16.545, 20.2275, 24.9425, 24.704, 21.5625, 20.3833333333333, 18.085, 16.325, 13.725, 13.095, 13.07, 15.2525, 16.4933333333333, 20.64, 23.0375, 22.4766666666667, 21.1975, 20.458, 17.9725, 16.1866666666667, 13.78, 13.155, 12.822, 14.0666666666667), date = structure(c(-976, -945, -915, -884, -853, -823, -792, -762, -731, -700, -671, -640, -610, -579, -549, -518, -487, -457, -426, -396, -365, -334, -306, -275), class = "Date")), .Names = c("year", "month", "temp", "date"), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24"), class = "data.frame")

library(plyr)
df = ddply(
  df,
  .(month),
  transform,
  temp_normed=temp-mean(temp)
)

这是split-apply-combine策略的一个示例,plyr包非常有用。

输出:

> df
   year month     temp       date temp_normed
1  1968     1 13.72500 1968-01-01 -0.02750000
2  1969     1 13.78000 1969-01-01  0.02750000
3  1968     2 13.09500 1968-02-01 -0.03000000
4  1969     2 13.15500 1969-02-01  0.03000000
5  1968     3 13.07000 1968-03-01  0.12400000
6  1969     3 12.82200 1969-03-01 -0.12400000
7  1968     4 15.25250 1968-04-01  0.59291667
8  1969     4 14.06667 1969-04-01 -0.59291667
9  1967     5 16.54500 1967-05-01  0.02583333
10 1968     5 16.49333 1968-05-01 -0.02583333
11 1967     6 20.22750 1967-06-01 -0.20625000
12 1968     6 20.64000 1968-06-01  0.20625000
13 1967     7 24.94250 1967-07-01  0.95250000
14 1968     7 23.03750 1968-07-01 -0.95250000
15 1967     8 24.70400 1967-08-01  1.11366667
16 1968     8 22.47667 1968-08-01 -1.11366667
17 1967     9 21.56250 1967-09-01  0.18250000
18 1968     9 21.19750 1968-09-01 -0.18250000
19 1967    10 20.38333 1967-10-01 -0.03733333
20 1968    10 20.45800 1968-10-01  0.03733333
21 1967    11 18.08500 1967-11-01  0.05625000
22 1968    11 17.97250 1968-11-01 -0.05625000
23 1967    12 16.32500 1967-12-01  0.06916667
24 1968    12 16.18667 1968-12-01 -0.06916667