使用lapply来识别特定值位于哪个bin中

时间:2018-08-29 08:44:31

标签: r lapply binning

数据集是这个

badData <- list(c(296,310), c(330,335), c(350,565))
df <- data.frame(wavelength = seq(300,360,5.008667),
                  reflectance = seq(-1,-61,-5.008667))
df    
   wavelength reflectance
   300.0000   -1.000000
   305.0087   -6.008667
   310.0173  -11.017334
   315.0260  -16.026001
   320.0347  -21.034668
   325.0433  -26.043335
   330.0520  -31.052002
   335.0607  -36.060669
   340.0693  -41.069336
   345.0780  -46.078003
   350.0867  -51.086670
   355.0953  -56.095337

最原始的问题是是否确定wavelength是否落在badData给定的范围内 提供的解决方案是这个 https://stackoverflow.com/a/52070363/1012249

我的问题正在使用类似的语法,如何识别它属于哪个badData箱。可以说badData的结构是这样的,而bin是不重叠的。

badData <- data.frame(bin=c('a','b','c'),start= c(296,330,350),end=c(310.01,335,565))

2 个答案:

答案 0 :(得分:2)

以下是使用模糊连接的示例:

library(fuzzyjoin)
df %>%
  fuzzy_left_join(badData, #join badData to df
                  by = c("wavelength" = "start", #variables to join by
                       "wavelength" = "end"),
                  match_fun=list(`>=`, `<=`)) #functions to use for each par of variables so "wavelength" >= "start" and "wavelength" <= "end" is the logic here
#output
   wavelength reflectance  bin start    end
1    300.0000   -1.000000    a   296 310.01
2    305.0087   -6.008667    a   296 310.01
3    310.0173  -11.017334 <NA>    NA     NA
4    315.0260  -16.026001 <NA>    NA     NA
5    320.0347  -21.034668 <NA>    NA     NA
6    325.0433  -26.043335 <NA>    NA     NA
7    330.0520  -31.052002    b   330 335.00
8    335.0607  -36.060669 <NA>    NA     NA
9    340.0693  -41.069336 <NA>    NA     NA
10   345.0780  -46.078003 <NA>    NA     NA
11   350.0867  -51.086670    c   350 565.00
12   355.0953  -56.095337    c   350 565.00

答案 1 :(得分:1)

您不需要循环。您可以简单地使用cut

badData <- data.frame(bin=c('a','b','c'),start= c(296,330,350),end=c(310.01,335,565))
df <- data.frame(wavelength = seq(300,360,5.008667),
                 reflectance = seq(-1,-61,-5.008667))

df$bins <- cut(df$wavelength, t(badData[, c("start", "end")]), 
               labels = head(c(t(cbind(as.character(badData$bin), "good"))), -1))
#   wavelength reflectance bins
#1    300.0000   -1.000000    a
#2    305.0087   -6.008667    a
#3    310.0173  -11.017334 good
#4    315.0260  -16.026001 good
#5    320.0347  -21.034668 good
#6    325.0433  -26.043335 good
#7    330.0520  -31.052002    b
#8    335.0607  -36.060669 good
#9    340.0693  -41.069336 good
#10   345.0780  -46.078003 good
#11   350.0867  -51.086670    c
#12   355.0953  -56.095337    c

您还没有说应该打开或关闭间隔的哪一侧,但是可以调整。