我创建了一个人工数据集:
x<-rnorm(100,10,10)
y<-rnorm(100,20,10)
Location<-c((rep("AB", 40)),(rep("TA", 30)),(rep("OP", 30)))
Year<-c((rep("1999", 10)),(rep("2000", 9)),(rep("2001", 12)),(rep("2002", 9)),(rep("1999", 7)),(rep("2000", 6)),(rep("2001", 6)),(rep("2002", 11)),(rep("1999", 12)),(rep("2000", 8)),(rep("2001", 5)),(rep("2002", 5)))
Data<-cbind(x,y,Location,Year)
> head(Data)
x y Location Year
[1,] "1.8938661556415" "19.851256070398" "AB" "1999"
[2,] "21.0735971323312" "17.4993965352294" "AB" "1999"
[3,] "30.8347289164302" "7.63333686308105" "AB" "1999"
[4,] "8.913993138201" "14.7085296541221" "AB" "1999"
[5,] "20.8309225677419" "12.0888505284667" "AB" "1999"
[6,] "25.3978549194374" "20.47154776064" "AB" "1999"
我想取每个x和y的arc2tan,如:
Theta<-atan2(y[i+1]-y[i],x[i+1]-x[i])
但我只希望在年份内每年都这样做,这意味着我不想在1999年到2000年之间,或者在2001年到2002年之间找到它们等等。只有在同一年的x和y点之间才能找到它们位置。
我最初编写了一个完成上述操作的循环(我不想做的事情),我想知道是否有人知道如何更改它,以便循环停止并重置自己每年。原始循环如下:
for (i in 1:length(x)-1){
Theta[i]<-atan2(y[i+1]-y[i],x[i+1]-x[i])
}
任何助手?
答案 0 :(得分:1)
你可以试试这个。
# a smaller test data set
x <- rnorm(24, 10, 10)
y <- rnorm(24, 20, 10)
loc <- rep(c("A", "B"), each = 4)
year <- rep(1999:2001, each = 8)
df <- data.frame(x, y, loc, year)
df
# apply function on subsets defined by location and year
# use tail and head to 'lag' y and x
by(df, df[ , c("loc", "year")], function(x){
with(x, atan2(y = tail(y, - 1) - head(y, -1), x = tail(x, -1) - head(x, - 1)))
})
# loc: A
# year: 1999
# [1] 2.306794 -2.363359 1.065151
# ---------------------------------------------------------------------------
# loc: B
# year: 1999
# [1] -1.077345 1.161944 -2.101823
# ---------------------------------------------------------------------------
# loc: A
# year: 2000
# [1] -1.76557207 1.79463661 -0.05251002
# ---------------------------------------------------------------------------
# loc: B
# year: 2000
# [1] 2.753115 -1.468055 -1.624389
# ...snip...
dplyr
替代方案。由于每个组中函数结果的长度不等于组大小或在这种情况下为1,因此dplyr
根本不喜欢咀嚼数据框(请参阅here和{{ 3}})。解决方法是向dplyr
提供data.table
。当然,data.table
唯一的解决方案是最简洁的。我把它留给比data.table
更熟悉的人......
library(data.table)
library(dplyr)
dt <- data.table(df)
dt %.%
group_by(loc, year) %.%
mutate(
atan = atan2(lead(y, default = NULL) - lag(y, default = NULL),
lead(x, default = NULL) - lag(x, default = NULL)))
# x y loc year atan
# 1 19.826573 18.354265 A 1999 2.30679446
# 2 11.856696 27.153197 A 1999 -2.36335869
# 3 -3.362242 12.150775 A 1999 1.06515149
# 4 11.126841 38.320662 A 1999 2.30679446
# 5 12.616396 31.782969 A 2000 -1.76557207
# 6 8.492305 10.877870 A 2000 1.79463661
# 7 4.921766 26.561845 A 2000 -0.05251002
# 8 14.398730 26.063752 A 2000 -1.76557207
# 9 11.800173 30.215422 A 2001 -2.74907150
# 10 -6.473259 22.650127 A 2001 0.11997030
# 11 6.528055 24.217425 A 2001 -1.71122202
# 12 4.951238 13.062497 A 2001 -2.74907150
# 13 1.640049 19.886848 B 1999 -1.07734532
# 14 4.123603 15.269110 B 1999 1.16194418
# 15 14.548780 39.330885 B 1999 -2.10182331
# 16 6.925468 26.350556 B 1999 -1.07734532
# ...snip...