如何使用R通过多个条件分配来自不同数据帧的值

时间:2019-06-11 21:25:19

标签: r

我有一个称为“数据”的数据框,其中具有“日期,月份,排放量和排放站”列。另一个具有“月,W1_Percentile和B1_Percentile”列的数据框称为“ perc”。 W1_Percentile和B1_Percentile是每个计量站的每月百分比值。我希望我的最终输出具有与df(data)相同的列,并带有“ Percentile”的附加列,该列将具有相应月份和计量站的百分位数值(相应月份的每个计量站的百分数值存储在df(perc))。我应该遵循什么步骤?

以下是输入数据的示例:

date <- as.Date(c('1950-03-12','1954-03-23','1991-06-27','1997-09-04','1991-06-27','1987-05-06','1987-05-29','1856-07-08','1993-06-04', '2001-09-19','2001-05-06','2001-05-27'))
month <- c('Mar','Mar','Jun','Sep','Jun','May','May','Jul','Jun','Sep','May','May')
disch <- c(125,1535,1654,154,4654,453,1654,145,423,433,438,6426)
station <- c('W1','W1','W1','W1','W1','W1','B1','B1','B1','B1','B1','B1')
data <- data.frame("Date"= date, "Month" = month,"Discharge"=disch,"station"=station)

      Date Month Discharge station
1  1950-03-12   Mar       125      W1
2  1954-03-23   Mar      1535      W1
3  1991-06-27   Jun      1654      W1
4  1997-09-04   Sep       154      W1
5  1991-06-27   Jun      4654      W1
6  1987-05-06   May       453      W1
7  1987-05-29   May      1654      B1
8  1856-07-08   Jul       145      B1
9  1993-06-04   Jun       423      B1
10 2001-09-19   Sep       433      B1
11 2001-05-06   May       438      B1
12 2001-05-27   May      6426      B1

Month <- c('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec')
W1 <- c(106,313,531.40,164.10,40,23.39,18.30,24,16,16,12,34)
B1 <- c(1330,1550,1948,1880,1260,853.15,680.15,486.10,503,625,738,1070)
perc <- data.frame("Month"=Month,"W1_Percentile"=W1,"B1_Percentile"=B1)

 Month W1_Percentile B1_Percentile
1    Jan        106.00       1330.00
2    Feb        313.00       1550.00
3    Mar        531.40       1948.00
4    Apr        164.10       1880.00
5    May         40.00       1260.00
6    Jun         23.39        853.15
7    Jul         18.30        680.15
8    Aug         24.00        486.10
9    Sep         16.00        503.00
10   Oct         16.00        625.00
11   Nov         12.00        738.00
12   Dec         34.00       1070.00

这是我希望最终输出看起来像的样子:

         Date Month Discharge station Percentile
1  1950-03-12   Mar       125      W1     531.40
2  1954-03-23   Mar      1535      W1     531.40
3  1991-06-27   Jun      1654      W1      23.39
4  1997-09-04   Sep       154      W1      16.00
5  1991-06-27   Jun      4654      W1      23.39
6  1987-05-06   May       453      W1      40.00
7  1987-05-29   May      1654      B1    1260.00
8  1856-07-08   Jul       145      B1     680.15
9  1993-06-04   Jun       423      B1     853.15
10 2001-09-19   Sep       433      B1     503.00
11 2001-05-06   May       438      B1    1260.00
12 2001-05-27   May      6426      B1    1260.00

1 个答案:

答案 0 :(得分:0)

我们需要首先将您的perc数据转换为长格式,以便我们拥有要添加到data的列,然后这是一个简单的联接:

library(tidyr)
library(dplyr)

# make the column names the same as the values in data
names(perc)[2:3] = c("W1", "B1")
# convert to long format
perc_long = gather(perc, key = "station", value = "percentile", W1, B1)

# join
left_join(data, perc_long)
# Joining, by = c("Month", "station")
#          Date Month Discharge station percentile
# 1  1950-03-12   Mar       125      W1     531.40
# 2  1954-03-23   Mar      1535      W1     531.40
# 3  1991-06-27   Jun      1654      W1      23.39
# 4  1997-09-04   Sep       154      W1      16.00
# 5  1991-06-27   Jun      4654      W1      23.39
# 6  1987-05-06   May       453      W1      40.00
# 7  1987-05-29   May      1654      B1    1260.00
# 8  1856-07-08   Jul       145      B1     680.15
# 9  1993-06-04   Jun       423      B1     853.15
# 10 2001-09-19   Sep       433      B1     503.00
# 11 2001-05-06   May       438      B1    1260.00
# 12 2001-05-27   May      6426      B1    1260.00

执行这些操作的方法有很多,实质上是两个R-FAQ的组合。有关其他参考,请参见