r中其他向量的值中的新向量

时间:2018-10-31 00:06:37

标签: r vector

我有这种形式的数据框:

df <- data.frame(country = rep(x = LETTERS[1:4], each = 5), year = rep(2001:2005), C=runif(20,30,100), Z=rnorm(20, mean = 0, sd = 1))

我想为每个国家/地区确定year == 2003时的Z值,并用该值除以C的所有值,因此每个国家/地区C的值将除以某个不同的数字,但是该数字将为在一个国家/地区相同-并将所有这些保存在新的向量“ New”中。因此,例如,对于A国,C中的所有值都将除以-0.80212515,对于B国将C中的所有值除以-0.62305076等。我该怎么办?谢谢!

4 个答案:

答案 0 :(得分:1)

使用拆分并分别处理每个数据集,然后将它们合并

r=sapply(split(df, df$country), function(x)New=x$Z/x$Z[x$year==2003])
d=tidyr::gather(as.data.frame(r),Country, New)

编辑

set.seed(0)
df <- data.frame(country = rep(x = LETTERS[1:4], each = 5), year = rep(2001:2005), C=runif(20,30,100), Z=rnorm(20, mean = 0, sd = 1))

r=sapply(split(df, df$country), function(x)New=x$Z/x$Z[x$year==2003])
d=tidyr::gather(as.data.frame(r),country, New)

cbind(df, d)

答案 1 :(得分:1)

您的数据与您在帖子中共享的示例不匹配。您需要使用set.seed()使其可重现。无论如何,这是使用dplyr-

的解决方案
set.seed(42)

df <- data.frame(country = rep(x = LETTERS[1:4], each = 5), 
                 year = rep(2001:2005),
                 C=runif(20,30,100),
                 Z=rnorm(20, mean = 0, sd = 1))

df %>%
  group_by(country) %>%
  mutate(
    New = C / Z[year == 2003]
  ) %>%
  pull(New)

# [1]  -67.70760  -68.83000  -36.02216  -63.45585  -53.94507  -24.97189  -30.70301
# [8]  -14.84183  -28.60558  -29.87234 -360.88226 -467.30510 -555.07518 -278.50602
# [15] -362.73532  -54.33474  -55.85181  -21.67929  -35.87291  -39.26086

答案 2 :(得分:1)

@Shree的data.table的另一种dplyr

set.seed(42)
dt <- data.table(country = rep(x = LETTERS[1:4], each = 5), year = rep(2001:2005), C=runif(20,30,100), Z=rnorm(20, mean = 0, sd = 1))
dt[,New := C/Z[year==2003],by="country"]
dt
#     country year        C          Z        New
#  1:       A 2001 94.03642  1.3048697  -67.70760
#  2:       A 2002 95.59528  2.2866454  -68.83000
#  3:       A 2003 50.02977 -1.3888607  -36.02216
#  4:       A 2004 88.13133 -0.2787888  -63.45585
#  5:       A 2005 74.92219 -0.1333213  -53.94507
#  6:       B 2001 66.33672  0.6359504  -24.97189
#  7:       B 2002 81.56118 -0.2842529  -30.70301
#  8:       B 2003 39.42666 -2.6564554  -14.84183
#  9:       B 2004 75.98946 -2.4404669  -28.60558
# 10:       B 2005 79.35453  1.3201133  -29.87234
# 11:       C 2001 62.04192 -0.3066386 -360.88226
# 12:       C 2002 80.33786 -1.7813084 -467.30510
# 13:       C 2003 95.42706 -0.1719174 -555.07518
# 14:       C 2004 47.88002  1.2146747 -278.50602
# 15:       C 2005 62.36050  1.8951935 -362.73532
# 16:       D 2001 95.80102 -0.4304691  -54.33474
# 17:       D 2002 98.47585 -0.2572694  -55.85181
# 18:       D 2003 38.22412 -1.7631631  -21.67929
# 19:       D 2004 63.24980  0.4600974  -35.87291
# 20:       D 2005 69.22329 -0.6399949  -39.26086

以及不依赖于data.tabledplyr的选项:

do.call(rbind,
        by(df, df$country, FUN = function(a) transform(a, New = C/Z[year==2003])))

答案 3 :(得分:1)

另一种解决方案,使用基数R

  1. 为每个国家/地区提取2005年的Z值

    v1 <- df[df$year==2005,4]

  2. 创建具有正确长度的除法向量

    z_2005 <- rep(x = v1[1:4],each = 5)

  3. 新向量<-C除以Z得出适当年份

    new <- df$C / Z_2005

  4. 如果您想将新列与旧数据框合并

    df2 <- cbind(df,Z_2005,new)