您好,我想知道最简单的命令(不使用诸如deplyr的任何其他库)如何找到R的下面数据帧,如何找到第二高的薪水并将雇员的姓名存储在名为2nd_high_employee的变量中?
EmployeeID EmployeeName Department Salary
----------- --------------- --------------- ---------
1 T Cook Finance 40000.00
2 D Michael Finance 25000.00
3 A Smith Finance 25000.00
4 D Adams Finance 15000.00
5 M Williams IT 80000.00
6 D Jones IT 40000.00
7 J Miller IT 50000.00
8 L Lewis IT 50000.00
9 A Anderson Back-Office 25000.00
10 S Martin Back-Office 15000.00
11 J Garcia Back-Office 15000.00
12 T Clerk Back-Office 10000.00
答案 0 :(得分:2)
下次,您可以考虑使用head(dput(x))发布数据样本,以使SO成员轻松读取数据。
df <- read.table(text = "
EmployeeID EmployeeName Department Salary
1 T Cook Finance 40000.00
2 D Michael Finance 25000.00
3 A Smith Finance 25000.00
4 D Adams Finance 15000.00
5 M Williams IT 80000.00
6 D Jones IT 40000.00
7 J Miller IT 50000.00
8 L Lewis IT 50000.00
9 A Anderson Back-Office 25000.00
10 S Martin Back-Office 15000.00
11 J Garcia Back-Office 15000.00
12 T Clerk Back-Office 10000.00", header = T)
second_high_employee <- tail(sort(df$Salary),2)[1]
second_high_employee
[1] 50000
顺便说一句,不可能用数字开头对象名称。您可以检查:?make.names
此外,对于每个部门,您都可以这样做:
aggregate(Salary ~ Department, df, function(x) {tail(sort(x), 2)[1]})
Department Salary
1 Back-Office 15000
2 Finance 25000
3 IT 50000
如果有2个最高薪水是80000,而您又想找到第二个最高薪水50000,则可以将x
或df$Salaray
包装在tail(sort(unique()), 2)[1]
答案 1 :(得分:1)
使用基数R:找到第二高的工资:
如果您需要子集而不考虑部门:
subset(dat,sort(z<-rank(Salary),T)[2]==z)
EmployeeID EmployeeName Department Salary
7 J Miller IT 50000
8 L Lewis IT 50000
如果考虑部门,则:
unsplit(by(dat,dat$Department,function(x)subset(x,(y<-rank(Salary))==sort(y,T)[2])),rep(1:3,each=2))
EmployeeID EmployeeName Department Salary
10 S Martin Back-Office 15000
11 J Garcia Back-Office 15000
2 D Michael Finance 25000
3 A Smith Finance 25000
7 J Miller IT 50000
8 L Lewis IT 50000
只需输入员工姓名:
as.character(subset(dat,sort(z<-rank(Salary),T)[2]==z)[,2])
[1] "Miller" "Lewis"