Question

我有一个清除数据的功能，只保留我想要的观察结果，但包括许多if语句。我订购了我想通过id保存的代码。

clear.data = function(x)
{  
  A = unique(x$code)

  if (4 %in% A  )
  {x = subset(x,code==4)}
  else if (10404 %in% A)
  {x = subset(x,code==10404)}
  else if (3942 %in% A)
  {x = subset(x,code==3942)}

  else {x=x}

  return(x)
}

例如在数据x

中

x = data.frame(id = c("A","A", "A", "B", "B","B", "B","C","C", "C","C"), 
               date = c( "29/05/2013", "23/08/2011", "25/09/2011",  "18/11/2011", "10/07/2013", "04/10/2011", "10/11/2011",  
                         "15/12/2011", "10/02/2008", "07/09/2009", "22/03/2012" ),
               code = c(4,4,3942,4,10404,3942,10404,10404,3942,10404,3942)      )

我将使用lapply仅保留我感兴趣的人的观察结果

> lapply(split(x,x$id),clear.data)
$A
  id       date code
1  A 29/05/2013    4
2  A 23/08/2011    4

$B
  id       date code
4  B 18/11/2011    4

$C
   id       date  code
8   C 15/12/2011 10404
10  C 07/09/2009 10404

问题是我有150个代码，所以很多if语句和一个大数据集来应用我的函数。有没有办法以某种方式减少if语句？为了找到一个解决方案并且搜索了很多但是找不到任何东西，我为了解决方案而烦恼。你有什么想法？非常感谢

Answer 1

您可以创建所有可能代码的向量（按照重要性＆＃39;的顺序），然后只需获取为该子集找到的第一个代码。

clear.data = function(x)
{  
    A = unique(x$code)

    codes <- c(4, 10404, 3942)

    # get boolean list of matches
    matches <- codes %in% A

    # if no matches, return x
    if(all(!matches)){
        return(x)
    }else{
        # else take first match
        sub_code <- codes[which.max(matches)]
        x <- subset(x, code == sub_code)
    }

    return(x)
}

Answer 2

不确定我是否完全理解您的目标，但这里有一个解决方案，为您提供包含ID和代码，完整记录的所有组合的列表。

使用您的数据：

A <- unique( x$code )
B <- unique( x$id )
z <- list( NULL )
k <- 1

for( i in A )
  for( j in B )
  {
    y <- ( x[ ( x$id == j & x$code == i ), ] )
    if( length( y[, 1 ] ) > 0 )
    {
      z[[ k ]] <- y
      names( z )[ k ] <- paste( i, "-", j, sep = "" )
      k <- k + 1
    }
  }

z
$`4-A`
  id       date code
1  A 29/05/2013    4
2  A 23/08/2011    4

$`4-B`
  id       date code
4  B 18/11/2011    4

$`3942-A`
  id       date code
3  A 25/09/2011 3942

$`3942-B`
  id       date code
6  B 04/10/2011 3942

$`3942-C`
   id       date code
9   C 10/02/2008 3942
11  C 22/03/2012 3942

$`10404-B`
  id       date  code
5  B 10/07/2013 10404
7  B 10/11/2011 10404

$`10404-C`
   id       date  code
8   C 15/12/2011 10404
10  C 07/09/2009 10404

减少R中if语句的数量

2 个答案: