Question

假设我有以下数据框（实际的数据集代表非常大的数据集）

df<- structure(list(x = c(1, 1, 1, 2, 2, 3, 3, 3), y = structure(c(1L, 
6L, NA, 2L, 4L, 3L, 7L, 5L), .Label = c("all", "fall", "hello", 
"hi", "me", "non", "you"), class = "factor"), z = structure(c(5L, 
NA, 4L, 2L, 1L, 6L, 3L, 4L), .Label = c("fall", "hi", "me", "mom", 
"non", "you"), class = "factor")), .Names = c("x", "y", "z"), row.names = c(NA, 
-8L), class = "data.frame")

看起来像

>df
  x     y    z
1 1   all  non
2 1   non <NA>
3 1  <NA>  mom
4 2  fall   hi
5 2    hi fall
6 3 hello  you
7 3   you   me
8 3    me  mom

我要做的是计算每组x（1,2或3）中匹配值的数量。例如，组号1有一个匹配的值"non"（NA应该被忽略）。所需的输出如下：

试图以这种方式思考，而不是for-loop，因为我有一个大型数据集，但无法找到我的方法。

Answer 1

使用@foreach( $category as $cat ) @if($cat->parent_id == 1) /*this is the beginning of the if statement*/ <li @if($cat->childs->count()) class="dropdown-submenu" @endif> <a class="links-titilium" href="{{ url( '/store/category', [$cat->id, Safeurl::make($cat->name)] ) }}" @if( $cat->childs->count() ) class="dropdown-toggle" data-toggle="dropdown" @endif> {{ $cat->name }} </a> @if( $cat->childs->count() ) <ul class="dropdown-menu" role="menu"> <li> @foreach( $cat->childs as $child ) <a href="{{url('/category', [Safeurl::make($cat->name), Safeurl::make($child->name)])}}"> {{ $child->name }} </a> @endforeach </li> </ul> @endif </li> @endif /*this is the end of the if statement*/ @endforeach`：

dplyr

Answer 2

只是为了夜间的乐趣，我尝试了一种基本的R解决方案，当然这很丑陋。

ind <- by(df, df$x, function(x) which(na.omit(x[["y"]]) %in% na.omit(df[["z"]])))
sm <- lapply(ind, length)
cbind(unique(df$x), sm)
sm
1 1 1 
2 2 2 
3 3 2

另一种基础R方法，代码较少（我希望不那么丑陋）：

ind <- by(df, df$x, function(x) sum(na.omit(x[["y"]]) %in% na.omit(x[["z"]])))
cbind(unique(df$x), ind)
    ind
1 1   1
2 2   2
3 3   2

Answer 3

以下是使用by()和match()的解决方案：

do.call(rbind,by(df,df$x,function(g) c(x=g$x[1],n=sum(!is.na(match(g$y,g$z,inc=NA))))));
##   x n
## 1 1 1
## 2 2 2
## 3 3 2

根据组ID匹配值

3 个答案: