在这里开始R程序员,我想在数据框中找到一个人最兼容的人。兼容性基于算法,该算法将点分配给数据帧中的某些值。 我有一个名为kewl.d00dz的数据框,它看起来像这样:
name dream.name birth.state birth.month birth.date major
1 stephen butch CO oct 11 ELEC
2 clark richard VA jan 19 BUAD
3 anthony bo NJ mar 26 BUAD
4 jack kordell VA jul 27 BUAD
5 eric adrian ND jun 17 GEOG
6 tyler anthony VA apr 12 CPSC
7 olivia isabella VA may 29 MATH
8 brad harvey HI aug 21 BUAD
9 hannah charlie VA aug 28 PSYC
10 will ronald VA may 11 BUAD
11 noor ani CA apr 14 BUAD
12 victoria elizabeth VA jan 11 MATH
13 morgan c lauren FL jun 15 BUAD
14 morgan w elizabeth VA feb 21 ARTS
15 helena helena VA apr 26 BIOL
16 amber amber leigh VA dec 6 PSCI
17 ekta kate VA apr 14 ARTH
18 caroline georgia DC jun 20 BUAD
19 anna abby VA sep 21 BUAD
20 nate julio VA sep 5 ECON
21 jessica jeanette VA oct 7 BUAD
22 shaina skylar VA sep 2 BUAD
23 ruth lucy VA jan 4 CPSC
24 sohyun caroline Seoul nov 16 PSYC
25 aaron don VA sep 1 ECON
26 alex axel VA sep 6 BIOL
cell num.bills num.states
1 none 5 41
2 apple 8 14
3 apple 4 14
4 apple 19 10
5 apple 6 19
6 samsung 1 10
7 apple 3 8
8 apple 1 18
9 apple 2 16
10 apple 5 20
11 apple 3 19
12 apple 5 17
13 apple 3 15
14 apple 4 24
15 android 0 18
16 apple 1 12
17 apple 1 19
18 apple 0 22
19 apple 0 27
20 samsung 4 32
21 samsung 5 11
22 apple 0 15
23 apple 7 30
24 apple 10 10
25 motorola 8 18
26 htc 3 20
我需要找到与我在函数中输入的任何人最兼容的人:
source("compatibility.R")
find.most.compatible<-function(x){
a<-which(kewl.d00dz$name==x)
x<-as.list(kewl.d00dz[a,])
pts<-list()
namez<-list()
for (i in 1:nrow(kewl.d00dz)){
y<-as.list(kewl.d00dz[i,])
pts[i]<-compatibility(x,y)
namez[i]<-kewl.d00dz[i,"name"]
names(pts)<-namez
}
n<-length(pts)
(which(pts == sort(pts,partial=n-1)[n-1]))
}
我希望它将第二个最高值返回给我,因为如果它返回第一个值,那么该人将与自己最兼容。但它给了我这个错误消息:
> find.most.compatible("stephen")
02727312231332325212224261723292219149302611312321
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) :
'x' must be atomic
这是我在前面提到的函数中调用的函数 我不想改变代码:
compatibility<-function(x,y){
#start point bag
com.points<-0
#number of bills compatibility points
com.points<-com.points +(10-abs(as.integer(x[["num.bills"]] - y[["num.bills"]])))
#different number of states compatibility points
diff.states<-abs(as.integer(x[["num.states"]]-y[["num.states"]]))
cat(diff.states)
if(diff.states<5){
com.points<-com.points+5
} else if(diff.states<10){
com.points<com.points+3
} else {
com.points<-com.points
}
#birth month compatibility points
if(x[["birth.month"]]== "dec"||x[["birth.month"]]== "jan"||x[["birth.month"]]== "feb"){
season1<-"winter"
} else if(x[["birth.month"]]== "mar"|| x[["birth.month"]]== "apr" || x[["birth.month"]]== "may"){
season1<-"spring"
} else if(x[["birth.month"]]== "jun"||x[["birth.month"]]== "jul"||x[["birth.month"]]== "aug"){
season1<-"summer"
} else {
season1<-"fall"
}
if(y[["birth.month"]]== "dec" || y[["birth.month"]]== "jan" || y[["birth.month"]] == "feb"){
season2<-"winter"
} else if(y[["birth.month"]]== "mar"||y[["birth.month"]]== "apr"||y[["birth.month"]]== "may"){
season2<-"spring"
} else if(y[["birth.month"]]== "jun"||y[["birth.month"]]== "jul"||y[["birth.month"]]== "aug"){
season2<-"summer"
} else {
season2<-"fall"
}
if (x[["birth.month"]] == y[["birth.month"]]){
com.points<-com.points + 3
} else if(season1==season2){
com.points<-com.points + 1
} else {
com.points<-com.points
}
#birth state compatibility points
if (x[["birth.state"]]==y[["birth.state"]]){
com.points<-com.points + 1
} else {
com.points<-com.points
}
#major compatibility points
if (x[["major"]]==y[["major"]]){
com.points<-com.points + 4
} else {
com.points<-com.points
}
#cellular provider compatibility points
if(x[["cell"]] == y[["cell"]]){
com.points<-com.points + 2
} else {
com.points<-com.points
}
return(com.points)
}
有人可以在不使用apply,subset等任何特殊功能的情况下对我的代码进行故障排除吗?
只允许使用which.max等。
答案 0 :(得分:0)
我还没有尝试过你的整个代码,但是我可以看到你需要将你的循环修改成这样的东西 - 否则你的函数会在第一次迭代时返回。
我注释掉了名字(点)行b / c,一旦所有物品都进入,这也可以在你的循环之外。
pts <- list() # if you actually want a list. You could also do c() for a vector
for (i in 1:nrow(kewl.d00dz)) {
y <- as.list(kewl.d00dz[i,])
pts[i] <- compatibility(x,y)
# names(pts) <- sprintf(kewl.d00dz[i,"name"],1:length(pts))
}
return(pts)