Question

我对该网站进行了研究，但发现回复率不能100％回答问题。对于50个变量（这里采样），我有258个data_frame个观察值：

pdis_reel   distance    corde   date_course spe ssp code_hippo  libel_hippo
3000    3000    G   25/03/15    T   A   1303    Marseille-BorÃ©ly
2625    2625    D   18/03/15    T   A   4701    Agen
2950    2925    G   17/02/15    T   A   601 Cagnes-sur-Mer
2675    2650    G   19/01/15    T   A   1302    Marseille-Vivaux
2650    2650    G   29/11/14    T   A   1302    Marseille-Vivaux
3250    3225    D   09/11/14    T   A   4203    Saint-Galmier
3025    3000    G   29/10/14    T   A   1303    Marseille-BorÃ©ly
2625    2600    D   04/10/14    T   A   303 Moulins
2875    2850    G   28/09/14    T   A   6901    Lyon-Parilly
2600    2600    D   10/09/14    T   A   8404    Cavaillon
4175    4150    D   06/09/14    T   A   7513    Vichy
2675    2675    G   17/08/14    T   A   102 Divonne-les-Bains
2700    2700    D   03/08/14    T   A   7301    Aix-les-Bains
2875    2850    G   04/07/14    T   A   4201    Feurs
2300    2300    G   21/05/14    T   A   1303    Marseille-BorÃ©ly
2650    2650    D   03/05/14    T   A   8301    HyÃ¨res
2650    2650    D   27/04/14    T   A   401 Oraison
2850    2850    G   22/04/14    T   A   6901    Lyon-Parilly

我想提取最接近匹配条件的所有行，例如：

centpoourcent<- subset(data_frame, corde=="D" & pdis_reel==2900+-200)

当我看看centpourcent它有258行和0列为什么？

另外，我不清楚为什么不建议在脚本或程序中使用subset()。

Answer 1

（根据要求）我会试着解释你的代码有什么问题。

当做 2900 + -200 这样的事情时，你基本上是在告诉R

从LHS获取号码
将其添加到RHS编号

减号

换句话说： 2900 - 200 = 2700 。不是在R

考虑以下示例数据

set.seed(123)
test <- sample(150, 20, replace  = TRUE)

假设我们的条件是 test == 60 + -20

我们可以做任何一次

test[test >= 40 & test <= 80]
## [1] 44 62 80 69 69 50

或者

between <- function(x, upper, lower) x[x >= upper & x <= lower]
between(test, 40, 80)
## [1] 44 62 80 69 69 50

或者

'%between%' <- function(x, y) x[x >= y[1] & x <= y[2]]
test %between% c(40, 80)
## [1] 44 62 80 69 69 50

或者只是加载具有相同功能的dplyr或data.table包

基于条件的数据子集

1 个答案: