我还是R的新手,我试图创建一个基于其他行条件的值的列。基本上,我想创建第4列(在我下面的虚拟数据中),其中包含最大值的年份(行标题)。
country = c("USA", "Canada", "UK", "France", "Mexico")
region = c("North America", "North America", "Europe", "Europe", "North America")
x2000 = c(50, 100, 75, 60, 25)
x2001 = c(200, 50, 150, 125, 60)
x2002 = c(125,75,60, 75, 25)
df = data.frame(country, region, x2000, x2001, x2002)
country region x2000 x2001 x2002
1 USA North America 50 200 125
2 Canada North America 100 50 75
3 UK Europe 75 150 60
4 France Europe 60 125 75
5 Mexico North America 25 60 25
所以,我希望它看起来像这样:
country = c("USA", "Canada", "UK", "France", "Mexico")
region = c("North America", "North America", "Europe", "Europe", "North America")
x2000 = c(50, 100, 75, 60, 25)
x2001 = c(200, 50, 150, 125, 60)
x2002 = c(125,75,60, 75, 25)
Max_year = (2001, 2000, 2001, 2001, 2001)
df = data.frame(country, region, x2000, x2001, x2002, Max)
country region x2000 x2001 x2002 Max_Year
1 USA North America 50 200 125 2001
2 Canada North America 100 50 75 2000
3 UK Europe 75 150 60 2001
4 France Europe 60 125 75 2001
5 Mexico North America 25 60 25 2001
我发现这似乎与我尝试做的最相似({if...else..} statement after group_by in dplyr chain)。但我希望它根据所有潜在值的最大值返回一个值。
当然,我的实际数据要大得多,我需要它通过两个变量对数据进行分组(这里的区域是简化的,但我需要将它保存在我的数据集中)然后在每个变量中应用条件"组"
这是我根据我找到的示例问题尝试解决的问题。
Data_Tidy$Max_Year <- Data_Tidy %>%
group_by(Data_Tidy$Country, Data_Tidy$area)%>%
do(model=if(Data_Tidy$Landing = pmax())
我可以将年份作为列标题(如上所述)或作为整齐的长data.frame,然后折叠回宽框架。
country region Year Value
1 USA North America 2000 50
2 Canada North America 2000 100
3 UK Europe 2000 75
4 France Europe 2000 60
5 Mexico North America 2000 25
6 USA North America 2001 200
7 Canada North America 2001 50
8 UK Europe 2001 150
9 France Europe 2001 125
10 Mexico North America 2001 60
11 USA North America 2002 125
12 Canada North America 2002 75
13 UK Europe 2002 60
14 France Europe 2002 75
15 Mexico North America 2002 25
我希望这很清楚。任何建议将不胜感激!!
答案 0 :(得分:1)
您可以使用apply rowwise来获取最大列的索引。
df$max <- names(df)[apply(df, 1, function(x) which.max(x[3:6]))+2]
df
country region x2000 x2001 x2002 max
1 USA North America 50 200 125 x2001
2 Canada North America 100 50 75 x2000
3 UK Europe 75 150 60 x2001
4 France Europe 60 125 75 x2001
5 Mexico North America 25 60 25 x2001
你可以在一年之前轻松摆脱'x'。或者使用向量c(2000,2001,2002)代替data.frame名称。
答案 1 :(得分:0)
transform(df,max_year=as.numeric(sub("\\D","",names(df[-(1:2)])[max.col(df[-(1:2)])])))
country region x2000 x2001 x2002 max_year
1 USA North America 50 200 125 2001
2 Canada North America 100 50 75 2000
3 UK Europe 75 150 60 2001
4 France Europe 60 125 75 2001
5 Mexico North America 25 60 25 2001