最靠近定义值的组进行子设置

时间:2016-03-22 16:36:48

标签: r dplyr subset

我有一个数据框,我希望在每个组中选择set.seed(1234) df <- data.frame(x = c(rep("A", 4), rep("B", 4)), y = c(rep(4, 2), rep(1, 2), rep(6, 2), rep(3, 2)), z = rnorm(8)) df ## x y z ## 1 A 4 -1.2070657 ## 2 A 4 0.2774292 ## 3 A 1 1.0844412 ## 4 A 1 -2.3456977 ## 5 B 6 0.4291247 ## 6 B 6 0.5060559 ## 7 B 3 -0.5747400 ## 8 B 3 -0.5466319 最接近特定值的行(例如:5)。

##   x y          z
## 1 A 4 -1.2070657
## 2 A 4  0.2774292
## 3 B 6  0.4291247
## 4 B 6  0.5060559

结果将是:

#!/bin/bash

EXPECTED_ARGS=2
E_BADARGS=65
# Check for proper number of command line args.
if [ $# -lt $EXPECTED_ARGS ]
then
    echo "Usage: `basename $0` commmand-for-pass-uri-without-scheme uri"
    echo "Example:  `basename $0` echo my-type://example.com"
    exit $E_BADARGS
fi

COMMAND=$1
URI=$2

# extract the protocol
proto="$(echo $URI | grep :// | sed -e's,^\(.*://\).*,\1,g')"
# remove the protocol -- updated
url=$(echo $URI | sed -e s,$proto,,g)
# extract the user (if any)
user="$(echo $url | grep @ | cut -d@ -f1)"
# extract the host -- updated
host=$(echo $url | sed -e s,$user@,,g | cut -d/ -f1)
# extract the path (if any)
path="$(echo $url | grep / | cut -d/ -f2-)"

$COMMAND $url

谢谢Philippe

4 个答案:

答案 0 :(得分:4)

df %>%
  group_by(x) %>%
  mutate(
    delta = abs(y - 5)
  ) %>%
  filter(delta == min(delta)) %>%
  select(-delta)

答案 1 :(得分:3)

或者使用基数R:

 df[do.call(c, tapply(df$y, df$x, function(x) x-5 == max(x - 5))),]
  x y          z
1 A 4 -1.2070657
2 A 4  0.2774292
5 B 6  0.4291247
6 B 6  0.5060559

答案 2 :(得分:1)

以下是data.table的选项。转换&#39; data.frame&#39;到&#39; data.table&#39; (setDT(df)),按&#39; x&#39;分组,我们创建得到&#39; y&#39;的绝对差异。使用5,从差异中检查min元素,获取行索引(.I),提取行索引(&#34; V1&#34;)列并对数据集进行子集化

library(data.table)
setDT(df)[df[, {v1 <- abs(y-5)
               .I[v1==min(v1)]}, x]$V1]
#   x y          z
#1: A 4 -1.2070657
#2: A 4  0.2774292
#3: B 6  0.4291247
#4: B 6  0.5060559

答案 3 :(得分:0)

val <- 5
delta <- abs(val - df$y)
df <- df[delta == min(delta), ]