循环计算,因为它进入r

时间:2012-11-16 16:51:10

标签: r loops dataframe

我难以执行迭代定义的计算。以下数据作为示例(实际数据集更大):

## DATA ##
# Columns
   Individual<-c("A","B","C","D","E","F","G","H1","H2","H3","H4","H5","K1","K2","K3","K4","K5")
   P1<-c(0,0,"A",0,"C","C",0, rep("E",5),"H1","H2","H3","H4","H5")
   P2<-c(0,0,"B",0,"D", "E",0,rep("G",5),"H1","H2","H3","H4","H5")
# Dataframe
   myd<-data.frame(Individual,P1,P2,stringsAsFactors=FALSE)


   Individual P1 P2
1           A  0  0
2           B  0  0
3           C  A  B
4           D  0  0
5           E  C  D
6           F  C  E
7           G  0  0
8          H1  E  G
9          H2  E  G
10         H3  E  G
11         H4  E  G
12         H5  E  G
13         K1 H1 H1
14         K2 H2 H2
15         K3 H3 H3
16         K4 H4 H4
17         K5 H5 H5

数据代表个人与父母P1P2之间的关系。

所需的计算,标记为relationA,表示每个人与A的相关程度。

根据定义,A和A之间的关系的值为1.所有其他个体的值需要根据表中的信息计算,如下所示:

The value of relationA for an individual should be equal to 
   1/2 (the value of relationA of P1 of the individual)  
 + 1/2 (the value of relationA of P2 of the individual)

例如

  Individual P1 P2      relationA
1           A  0  0       1
2           B  0  0       0
3           C  A  B       (A = 1 + B = 0)/2 = 0.5
4           D  0  0       0
5           E  C  D       (C= 0.5 + D = 0)/2 = 0.25
6           F  C  E       (C = 0.5 + E = 0.25)/2 = 0.375  

预期输出如下:

 Individual P1 P2  relationA
1           A  0  0   1
2           B  0  0   0
3           C  A  B   0.5
4           D  0  0   0
5           E  C  D   0.25
6           F  C  E   0.375
7           G  0  0   0 
8          H1  E  G   0.125
9          H2  E  G   0.125
10         H3  E  G   0.125
11         H4  E  G   0.125
12         H5  E  G   0.125
13         K1 H1 H1   0.125
14         K2 H2 H2   0.125
15         K3 H3 H3   0.125
16         K4 H4 H4   0.125
17         K5 H5 H5   0.125

我的困难是在R中以适当的方式表达这一点。任何帮助,将不胜感激。

2 个答案:

答案 0 :(得分:4)

您可以编写一个函数来计算给定个体的值,并(隐式地)将关系计算为一个简单的递归函数。

relationA <- function(ind) {
  if(ind == "A") {
    1
  } else if (ind == "0") {
    0
  } else {
    pts <- myd[myd$Individual == ind,]
    (relationA(pts[["P1"]]) + relationA(pts[["P2"]])) / 2
  }
}

简单地说,如果个人是A,则为1;如果个体为0,则为0;对于任何其他内容,递归调用与该个体对应的每个父(relationAP1)的P2并将它们加在一起并除以2.这仅适用于一次一个人:

> relationA("A")
[1] 1
> relationA("F")
[1] 0.375
> relationA("K5")
[1] 0.125

但你可以相对容易地在所有人身上进行矢量化:

> sapply(myd$Individual, relationA)
    A     B     C     D     E     F     G    H1    H2    H3    H4    H5    K1 
1.000 0.000 0.500 0.000 0.250 0.375 0.000 0.125 0.125 0.125 0.125 0.125 0.125 
   K2    K3    K4    K5 
0.125 0.125 0.125 0.125 

可以使用

将其分配回myd
myd$relationA <- sapply(myd$Individual, relationA)

这不是特别有效,因为它必须针对每种情况一遍又一遍地计算relationA。当它到达“K5”时,它会调用reationA("H5")两次,每次调用relationA("E")relationA("G"),然后调用relationA("C")relationA("D"),{{ 1}}和relationA("0")等等。也就是说,没有结果被缓存,而是每次重新计算。对于这个小数据集来说,这并不重要,因为即使效率低下仍然非常快。

如果您希望/需要缓存结果并使用该缓存,则可以修改relationA("0")来执行此操作。

relationA

然后你必须初始化缓存:

relationAc <- function(ind) {
  pts <- myd[myd$Individual == ind,]
  if(nrow(pts) == 0 | any(is.na(pts[["relationA"]]))) {
    relationA <-
      if(ind == "A") {
        1
      } else if (ind == "0") {
        0
      } else {
        (relationAc(pts[["P1"]]) + relationAc(pts[["P2"]])) / 2
      }
    myd[myd$Individual == ind, "relationA"] <<- relationA
    relationA
  } else {
    pts[["relationA"]]
  }
}

单个调用将填充所需的值,并且调用整个个体集将导致填写所有值。

myd$relationA <- NA_real_

答案 1 :(得分:3)

编辑:

更简洁,您可以使用sapplyrowSumsfor-loop转换为一行代码:

# Initialize values of relationA
myd$relationA <- 0
myd$relationA[myd$Individual=="A"] <- 1

# Calculate relationA
myd$relationA <-   myd$relationA + rowSums(sapply(myd$Individual, function(indiv) 
     myd$relationA[myd$Individual==indiv]/2 * ((myd$P1==indiv) + (myd$P2==indiv))))

<小时/>

你在寻找这样的东西吗?

# Initialize values of relationA
myd$relationA <- 0
myd$relationA[myd$Individual=="A"] <- 1


# Iterate over all Individuals
for (indiv in myd$Individual) {

  indiVal <- myd$relationA[myd$Individual==indiv]

  # all columns handled at once, thanks to vectorization;  no need for myd$P1[i]
  myd$relationA <- myd$relationA  + 
                 indiVal/2 * ((myd$P1==indiv) + (myd$P2==indiv))
}

<强>输出

myd
   Individual P1 P2 relationA
1           A  0  0     1.000
2           B  0  0     0.000
3           C  A  B     0.500
4           D  0  0     0.000
5           E  C  D     0.250
6           F  C  E     0.375
7           G  0  0     0.000
8          H1  E  G     0.125
9          H2  E  G     0.125
...