Question

我的前提是我是R的新人，实际上我正试图获得基本面。目前，我正在处理大型数据框（称为＆＃34; ppl＆＃34;），我必须编辑这些数据框以过滤某些行。每行都包含在一个组中，其特征在于强度（入）值和样本值。

       mz  rt      into   sample  tracker     sn   grp
 100.0153 126  2.762664      3    11908 7.522655   0
 100.0171 127  2.972048      2    5308  7.718521   0
 100.0788 272 30.217969      2    5309 19.024807   1
 100.0796 272 17.277916      3   11910  7.297716   1
 101.0042 128 37.557324      3   11916 27.991320   2
 101.0043 128 39.676014      2    5316 28.234918   2

嗯，第一个问题是：＆＃34;如何从每个组中选择强度最高的样本？＆＃34; 我尝试了一个for循环：

for (i in ppl$grp) {
temp<-ppl[ppl$grp == i,]
sel<-rbind(sel,temp[max(temp$into),])
}

事实是它适用于ppl $ grp == 0，但下一个循环返回NAs行。然后，过滤后的数据帧（称为＆＃34; sel＆＃34;）也应该存储已删除行的样本值。它应该如下：

      mz  rt      into   sample  tracker     sn   grp
100.0171 127  2.972048   c(2,3)    5308  7.718521   0
100.0788 272 30.217969   c(2,3)    5309 19.024807   1
101.0043 128 39.676014   c(2,3)    5316 28.234918   2

为了得到这个，我会使用这种方法：

lev<-factor(ppl$grp)
samp<-ppl$sample
samp2<-split(samp,lev)
sel$sample<-samp2

任何提示？因为我还没有解决以前的问题，所以我无法测试它。

非常感谢。

Answer 1

不确定我是否关注了您的问题。但也许这会让你开始。

library(dplyr)
ppl %>% group_by(grp) %>% filter(into == max(into))

Answer 2

使用base R的{{1}}选项是

ave

如果预期输出中的'sample'列在每个'grp'中都有ppl[with(ppl, ave(into, grp, FUN = max)==into),]个元素，那么在按'grp'分组后，将'sample'更新为unique d {{ 1}}'sample'的元素，然后paste'into'降序和unique第1行。

arrange

Answer 3

library(data.table) setkey(setDT(ppl),grp) ppl <- ppl[ppl[,into==max(into),by=grp]$V1,] ## mz rt into sample tracker sn grp ##1: 100.0171 127 2.972048 2 5308 7.718521 0 ##2: 100.0788 272 30.217969 2 5309 19.024807 1 ##3: 101.0043 128 39.676014 2 5316 28.234918 2替代方案：

<table>

Answer 4

我不知道为什么这段代码会起作用

for (i in ppl$grp) {
  temp<-ppl[ppl$grp == i,]
  sel<-rbind(sel,temp[max(temp$into),])
}

max（temp $ into）应返回最大值 - 在大多数情况下似乎不是整数。

此外，在每个for循环实例中使用rbind构建data.frame并不是一种好习惯（在任何语言中）。它需要退出一些类型检查和阵列增长，这可能会非常昂贵。

此外，当该组有任何NA时，max将返回NA。

还有一个关于你想要做什么关系的问题？你只想要一个结果还是全部结果？ Akrun给出的代码将为您提供所有这些代码。

此代码将编写一个包含max

组的新列

 ppl$grpmax <- ave(ppl$into, ppl$grp, FUN=function(x) { max(x, na.rm=TRUE ) } )

然后，您可以使用

选择组中与max相等的所有值

pplmax <- subset(ppl, into == grpmax)

如果您只想要每组一个，那么您可以删除重复项

pplmax[!duplicated(pplmax$grp),]

R For循环失败应用最大函数

4 个答案: