根据跨行的条件为列分配值

时间:2018-01-31 16:23:47

标签: r

我还是R的新手,我试图创建一个基于其他行条件的值的列。基本上,我想创建第4列(在我下面的虚拟数据中),其中包含最大值的年份(行标题)。

country = c("USA", "Canada", "UK", "France", "Mexico")
region = c("North America", "North America", "Europe", "Europe", "North America")  
x2000 = c(50, 100, 75, 60, 25) 
x2001 = c(200, 50, 150, 125, 60)
x2002 = c(125,75,60, 75, 25)
df = data.frame(country, region, x2000, x2001, x2002) 

   country    region        x2000  x2001  x2002
1     USA  North America      50    200    125
2  Canada  North America     100     50     75
3      UK     Europe          75    150     60
4  France     Europe          60    125     75
5  Mexico  North America      25     60     25

所以,我希望它看起来像这样:

country = c("USA", "Canada", "UK", "France", "Mexico")
region = c("North America", "North America", "Europe", "Europe", "North America")  
x2000 = c(50, 100, 75, 60, 25) 
x2001 = c(200, 50, 150, 125, 60)
x2002 = c(125,75,60, 75, 25)
Max_year = (2001, 2000, 2001, 2001, 2001)
df = data.frame(country, region, x2000, x2001, x2002, Max) 


   country    region        x2000  x2001  x2002  Max_Year
1     USA  North America      50    200    125    2001
2  Canada  North America     100     50     75    2000
3      UK     Europe          75    150     60    2001
4  France     Europe          60    125     75    2001
5  Mexico  North America      25     60     25    2001

我发现这似乎与我尝试做的最相似({if...else..} statement after group_by in dplyr chain)。但我希望它根据所有潜在值的最大值返回一个值。

当然,我的实际数据要大得多,我需要它通过两个变量对数据进行分组(这里的区域是简化的,但我需要将它保存在我的数据集中)然后在每个变量中应用条件"组"

这是我根据我找到的示例问题尝试解决的问题。

Data_Tidy$Max_Year <- Data_Tidy %>%
group_by(Data_Tidy$Country, Data_Tidy$area)%>%
do(model=if(Data_Tidy$Landing = pmax())

我可以将年份作为列标题(如上所述)或作为整齐的长data.frame,然后折叠回宽框架。

     country     region      Year   Value
 1      USA   North America  2000    50
 2   Canada   North America  2000   100
 3       UK       Europe     2000    75
 4   France       Europe     2000    60
 5   Mexico   North America  2000    25
 6      USA   North America  2001   200
 7   Canada   North America  2001    50
 8       UK       Europe     2001   150
 9   France       Europe     2001   125
10  Mexico    North America  2001    60
11     USA    North America  2002   125
12  Canada    North America  2002    75
13      UK       Europe      2002    60
14  France       Europe      2002    75
15  Mexico    North America  2002    25

我希望这很清楚。任何建议将不胜感激!!

2 个答案:

答案 0 :(得分:1)

您可以使用apply rowwise来获取最大列的索引。

df$max <- names(df)[apply(df, 1, function(x) which.max(x[3:6]))+2] 
df
country        region x2000 x2001 x2002   max
1     USA North America    50   200   125 x2001
2  Canada North America   100    50    75 x2000
3      UK        Europe    75   150    60 x2001
4  France        Europe    60   125    75 x2001
5  Mexico North America    25    60    25 x2001

你可以在一年之前轻松摆脱'x'。或者使用向量c(2000,2001,2002)代替data.frame名称。

答案 1 :(得分:0)

 transform(df,max_year=as.numeric(sub("\\D","",names(df[-(1:2)])[max.col(df[-(1:2)])])))
  country        region x2000 x2001 x2002 max_year
1     USA North America    50   200   125     2001
2  Canada North America   100    50    75     2000
3      UK        Europe    75   150    60     2001
4  France        Europe    60   125    75     2001
5  Mexico North America    25    60    25     2001