我有这种形式的数据框:
df <- data.frame(country = rep(x = LETTERS[1:4], each = 5), year = rep(2001:2005), C=runif(20,30,100), Z=rnorm(20, mean = 0, sd = 1))
我想为每个国家/地区确定year == 2003时的Z值,并用该值除以C的所有值,因此每个国家/地区C的值将除以某个不同的数字,但是该数字将为在一个国家/地区相同-并将所有这些保存在新的向量“ New”中。因此,例如,对于A国,C中的所有值都将除以-0.80212515,对于B国将C中的所有值除以-0.62305076等。我该怎么办?谢谢!
答案 0 :(得分:1)
使用拆分并分别处理每个数据集,然后将它们合并
r=sapply(split(df, df$country), function(x)New=x$Z/x$Z[x$year==2003])
d=tidyr::gather(as.data.frame(r),Country, New)
编辑
set.seed(0)
df <- data.frame(country = rep(x = LETTERS[1:4], each = 5), year = rep(2001:2005), C=runif(20,30,100), Z=rnorm(20, mean = 0, sd = 1))
r=sapply(split(df, df$country), function(x)New=x$Z/x$Z[x$year==2003])
d=tidyr::gather(as.data.frame(r),country, New)
cbind(df, d)
答案 1 :(得分:1)
您的数据与您在帖子中共享的示例不匹配。您需要使用set.seed()
使其可重现。无论如何,这是使用dplyr
-
set.seed(42)
df <- data.frame(country = rep(x = LETTERS[1:4], each = 5),
year = rep(2001:2005),
C=runif(20,30,100),
Z=rnorm(20, mean = 0, sd = 1))
df %>%
group_by(country) %>%
mutate(
New = C / Z[year == 2003]
) %>%
pull(New)
# [1] -67.70760 -68.83000 -36.02216 -63.45585 -53.94507 -24.97189 -30.70301
# [8] -14.84183 -28.60558 -29.87234 -360.88226 -467.30510 -555.07518 -278.50602
# [15] -362.73532 -54.33474 -55.85181 -21.67929 -35.87291 -39.26086
答案 2 :(得分:1)
@Shree的data.table
的另一种dplyr
:
set.seed(42)
dt <- data.table(country = rep(x = LETTERS[1:4], each = 5), year = rep(2001:2005), C=runif(20,30,100), Z=rnorm(20, mean = 0, sd = 1))
dt[,New := C/Z[year==2003],by="country"]
dt
# country year C Z New
# 1: A 2001 94.03642 1.3048697 -67.70760
# 2: A 2002 95.59528 2.2866454 -68.83000
# 3: A 2003 50.02977 -1.3888607 -36.02216
# 4: A 2004 88.13133 -0.2787888 -63.45585
# 5: A 2005 74.92219 -0.1333213 -53.94507
# 6: B 2001 66.33672 0.6359504 -24.97189
# 7: B 2002 81.56118 -0.2842529 -30.70301
# 8: B 2003 39.42666 -2.6564554 -14.84183
# 9: B 2004 75.98946 -2.4404669 -28.60558
# 10: B 2005 79.35453 1.3201133 -29.87234
# 11: C 2001 62.04192 -0.3066386 -360.88226
# 12: C 2002 80.33786 -1.7813084 -467.30510
# 13: C 2003 95.42706 -0.1719174 -555.07518
# 14: C 2004 47.88002 1.2146747 -278.50602
# 15: C 2005 62.36050 1.8951935 -362.73532
# 16: D 2001 95.80102 -0.4304691 -54.33474
# 17: D 2002 98.47585 -0.2572694 -55.85181
# 18: D 2003 38.22412 -1.7631631 -21.67929
# 19: D 2004 63.24980 0.4600974 -35.87291
# 20: D 2005 69.22329 -0.6399949 -39.26086
以及不依赖于data.table
或dplyr
的选项:
do.call(rbind,
by(df, df$country, FUN = function(a) transform(a, New = C/Z[year==2003])))
答案 3 :(得分:1)
另一种解决方案,使用基数R
为每个国家/地区提取2005年的Z值
v1 <- df[df$year==2005,4]
创建具有正确长度的除法向量
z_2005 <- rep(x = v1[1:4],each = 5)
新向量<-C除以Z得出适当年份
new <- df$C / Z_2005
如果您想将新列与旧数据框合并
df2 <- cbind(df,Z_2005,new)