我目前正在处理类似这样的数据框:
Site Spp1 Spp2 Spp3 LOC TYPE
S01 2 4 0 A FLOOD
S02 4 0 0 A REG
....
S10 0 1 0 B FLOOD
S11 1 0 0 B REG
我试图做的是对数据帧进行子集化,以便我可以在R中运行一些指标物种分析。
以下代码的工作原理是我创建了两个数据子集,将它们合并为一个帧然后删除未使用的因子级别
A.flood <- filter(data, TYPE == "FLOOD", LOC == "A")
B.flood <- filter(data, TYPE == "FLOOD", LOC == "B")
A.B.flood <- rbind(A.flood, B.flood) %>% droplevels.data.frame(A.B.flood, except = c("A", "B"))
我还希望/需要做的是删除所有Spp
列(在我的真实数据集中有~60),总和为零。有没有办法用dplyr实现这一点,如果有,是否可以将该代码传递到现有的A.B.flood
数据帧代码?
谢谢!
修改
我设法删除了总和为零的所有列,只选择总和为&gt;的列。零:
A.B.flood.subset <- A.B.flood[, apply(A.B.flood[1:(ncol(A.B.flood))], 2, sum)!=0]
答案 0 :(得分:5)
如果不使用任何软件包,我们可以使用rowSums
的'Spp'列(使用grep
对列进行分组)并加倍否定,以便sum>0
的行为TRUE,其他假。使用此索引对行进行子集化。
data[!!rowSums(data[grep('Spp', names(data))]),]
或者使用dplyr/magrittr
,我们select
'Spp'列,获取sum
每行的Reduce
,双重否定并使用extract
magrittr
将原始数据集与派生索引进行子集化。
library(dplyr)
library(magrittr)
data %>%
select(matches('^Spp')) %>%
Reduce(`+`, .) %>%
`!` %>%
`!` %>%
extract(data,.,)
data <- structure(list(Site = c("S01", "S02", "S03", "S04"),
Spp1 = c(2L,
4L, 0L, 4L), Spp2 = c(4L, 0L, 0L, 0L), Spp3 = c(0L, 0L, 0L, 0L
), LOC = c("A", "A", "A", "A"), TYPE = c("FLOOD", "REG",
"FLOOD",
"REG")), .Names = c("Site", "Spp1", "Spp2", "Spp3", "LOC",
"TYPE"), class = "data.frame", row.names = c(NA, -4L))
答案 1 :(得分:3)
我意识到这个问题现在已经很老了,但是我来到了另一个使用dplyr的“select”和“which”的解决方案,这对dplyr的爱好者来说似乎更清楚:
<?php if($crumbs && is_array($crumbs)): ?>
<!-- SLIDER START HERE-->
<div class="container">
<div class="innerSlider relative">
<div class="innerSlidercontent">
<h2>“People with thyroid problems avoid eating millet on a daily basis
as it is believed to contain some”</h2>
</div>
<div class="insideSlidePic"> <img src="<?php echo $this-
> getSkinUrl('images/barnyard_millet_bisi_belle_bhaat_recipe.jpg'); ?>"
alt=""> </div>
</div>
<div class="clear"></div>
<ul class="routeMenu">
<?php foreach($crumbs as $_crumbName=>$_crumbInfo): ?>
<li class="<?php echo $_crumbName ?>">
<?php if($_crumbInfo['link']):
if($_crumbInfo['title']=="Home"){
$_crumbInfo['link']=Mage::getUrl();
}
?>
<a class="sc_hover" href="<?php echo $_crumbInfo['link'] ?>"
title="<?php echo $this->escapeHtml($_crumbInfo['title']) ?>"><?php echo
$this->escapeHtml($_crumbInfo['label']) ?></a>
<?php elseif($_crumbInfo['last']): ?>
<span><?php echo $this->escapeHtml($_crumbInfo['label']) ?>
</span>
<?php else: ?>
<?php echo $this->escapeHtml($_crumbInfo['label']) ?>
<?php endif; ?>
<?php if(!$_crumbInfo['last']): ?>
<?php endif; ?>
</li>
<?php endforeach; ?>
</ul>
</div>
<div class="clear"></div>
<!-- SLIDER END HERE-->
<?php endif; ?>
答案 2 :(得分:2)
您应该使用tidyr::gather()
转换为整洁的数据,数据框将更容易操作。
library(tidyr)
library(dplyr)
A.B.Flood %>% gather(Species, Sp.Count, -Site, -LOC, -TYPE) %>%
group_by(Species) %>%
filter(Sp.Count > 0)
Voila,你整洁的数据减去了零数。
# Site LOC TYPE Species Sp.Count
# <fctr> <fctr> <fctr> <chr> <int>
#1 S01 A FLOOD Spp1 2
#2 S02 A REG Spp1 4
#3 S11 B REG Spp1 1
#4 S01 A FLOOD Spp2 4
#5 S10 B FLOOD Spp2 1
就个人而言,我保持这样。如果您希望原始格式返回非丢弃物种的零计数,只需将%>% spread(Species, Sp.Count, fill = 0)
添加到管道。
# Site LOC TYPE Spp1 Spp2
#* <fctr> <fctr> <fctr> <dbl> <dbl>
#1 S01 A FLOOD 2 4
#2 S02 A REG 4 0
#3 S10 B FLOOD 0 1
#4 S11 B REG 1 0
答案 3 :(得分:1)
有一种更简单,更快捷的方法(并且更符合您的问题:使用dplyr)。
A.B.flood.subset <- A.B.flood %>% .[., colSums(. != 0) > 0]
答案 4 :(得分:1)
对于那些想要使用带有 where
关键字的 dplyr 1.0.0 的人,您可以这样做:
A.B.flood %>%
select(where( ~ is.numeric(.x) && sum(.x) != 0))
返回:
Spp1 Spp2
1 2 4
2 4 0
3 0 0
4 4 0
使用@akrun 给出的相同数据:
A.B.flood <- structure(
list(
Site = c("S01", "S02", "S03", "S04"),
Spp1 = c(2L,
4L, 0L, 4L),
Spp2 = c(4L, 0L, 0L, 0L),
Spp3 = c(0L, 0L, 0L, 0L),
LOC = c("A", "A", "A", "A"),
TYPE = c("FLOOD", "REG",
"FLOOD",
"REG")
),
.Names = c("Site", "Spp1", "Spp2", "Spp3", "LOC",
"TYPE"), class = "data.frame", row.names = c(NA, -4L))