R:最佳匹配比较

时间:2017-10-16 21:28:28

标签: r loops for-loop match

我有一个如下所示的长帧文件:

df <- structure(list(Date =c("2011-01", "2011-08", "2012-03", "2011-01", "2011-08", "2011-01", "2011-08", "2011-01", "2011-08", 
                   "2011-01", "2011-08", "2012-03", "2011-01", "2011-08", "2011-01", "2011-08", "2011-01", "2011-08",
                   "2011-01", "2011-08", "2012-03", "2011-01", "2011-08", "2011-01", "2011-08", "2011-01", "2011-08"),
     Part=c("A", "A", "A", "A", "A", "A", "A", "A", "A",
            "B", "B", "B", "B", "B", "B", "B", "B", "B", 
            "C", "C", "C", "C", "C", "C", "C", "C", "C"),
     method=c("Type1","Type1","Type1","Type2","Type2","Type3","Type3","Type4","Type4",
              "Type1","Type1","Type1","Type2","Type2","Type3","Type3","Type4","Type4",
              "Type1","Type1","Type1","Type2","Type2","Type3","Type3","Type4","Type4"),
     value= c(4L, 46L, 43L, 9L, 8L, 46L, 63L, 84L, 2L, 5L, 78L, 2L, 89L, 2L, 6L, 62L, 25L, 46L, 3L, 4L, 7L, 24L, 13L, 21L, 19L, 8L, 3L)),
     class= "data.frame", row.names=c(NA, -27L))

我想创建另一个名为BestMethod的列。变量应该是与按类型和日期最接近类型3的值对应的方法列表。

例如,在2011-01的A部分中,类型1,2,3已应用,类型1最接近类型3.在BestMethod下,我将使用Type1。否则如果没有应用所有3种类型,我会把NA。

(在excel中,它看起来像这样:

=INDEX(C2:F2, MATCH(MIN(ABS(C2:F2-B2)), ABS(C2:F2-B2),0))

然后这个:

=IF(B2="", "NA", INDEX($C$1:$F$1,1,(MATCH(H2,C2:F2,0))))) 

然后我想创建另一个名为FinalMethod的列。我想为所有日期复制每个部分列出最多的类型。

例如。在2011-01,2011-02中,对于A部分,类型1是更好的匹配,但在2011-03类型2是更好的匹配。在这种情况下,我希望类型1成为此部分所有日期的FinalMethod

我尝试了以下内容:

which(abs(x-your.number)==min(abs(x-your.number)))

但是每次调用正确的数据值并在每行中运行它都会遇到麻烦。

感谢。

期望的输出:

df <- structure(list(Date =c("2011-01", "2011-08", "2012-03", "2011-01", "2011-08", "2011-01", "2011-08", "2011-01", "2011-08", 
                   "2011-01", "2011-08", "2012-03", "2011-01", "2011-08", "2011-01", "2011-08", "2011-01", "2011-08",
                   "2011-01", "2011-08", "2012-03", "2011-01", "2011-08", "2011-01", "2011-08", "2011-01", "2011-08"),
     Part=c("A", "A", "A", "A", "A", "A", "A", "A", "A",
            "B", "B", "B", "B", "B", "B", "B", "B", "B", 
            "C", "C", "C", "C", "C", "C", "C", "C", "C"),
     method=c("Type1","Type1","Type1","Type2","Type2","Type3","Type3","Type4","Type4",
              "Type1","Type1","Type1","Type2","Type2","Type3","Type3","Type4","Type4",
              "Type1","Type1","Type1","Type2","Type2","Type3","Type3","Type4","Type4"),
     value= c(4L, 46L, 43L, 9L, 8L, 46L, 63L, 84L, 2L, 5L, 78L, 2L, 89L, 2L, 6L, 62L, 25L, 46L, 3L, 4L, 7L, 24L, 13L, 21L, 19L, 8L, 3L),
     BestModel=c("Type2", "Type1", "NA", "Type2", "Type1", "Type2", "Type1", "Type2", "Type1", 
                 "Type1", "Type1Type4", "NA", "Type1", "Type1Type4", "Type1", "Type1Type4","Type1", "Type1Type4",
                 "Type2", "Type2", "NA",  "Type2", "Type2",  "Type2", "Type2",  "Type2", "Type2"), 
     FinalModel= c("Type1Type2", "Type1Type2","Type1Type2", "Type1Type2","Type1Type2", "Type1Type2","Type1Type2","Type1Type2","Type1Type2",
                   "Type1", "Type1", "Type1", "Type1", "Type1", "Type1","Type1", "Type1", "Type1", 
                   "Type2", "Type2","Type2", "Type2", "Type2", "Type2","Type2", "Type2", "Type2")), 
     class= "data.frame", row.names=c(NA, -27L))

1 个答案:

答案 0 :(得分:1)

使用dplyr + tidyr的不太优雅的解决方案,但有效:

library(dplyr)
library(tidyr)

temp = df %>%
  group_by(Part, Date) %>%
  mutate(value.x = ifelse(method == "Type3", value, NA)) %>%
  fill(value.x, .direction = "up") %>%
  fill(value.x) %>%
  mutate(difference = abs(value.x - value)) %>%
  filter(method != "Type3") %>%
  filter(difference == min(difference)) 

BestMethod = temp %>%
  summarize(BestMethod = paste(method, collapse = " ")) 

FinalMethod = temp %>%
  group_by(Part, method) %>%
  summarize(count = n()) %>%
  filter(count == max(count)) %>%
  rename(FinalMethod = method)

df %>%
  full_join(BestMethod) %>%
  full_join(FinalMethod) %>%
  select(-count) %>%
  arrange(Part, Date)

<强>结果:

      Date Part method value  BestMethod FinalMethod
1  2011-01    A  Type1     4       Type2       Type1
2  2011-01    A  Type1     4       Type2       Type2
3  2011-01    A  Type2     9       Type2       Type1
4  2011-01    A  Type2     9       Type2       Type2
5  2011-01    A  Type3    46       Type2       Type1
6  2011-01    A  Type3    46       Type2       Type2
7  2011-01    A  Type4    84       Type2       Type1
8  2011-01    A  Type4    84       Type2       Type2
9  2011-08    A  Type1    46       Type1       Type1
10 2011-08    A  Type1    46       Type1       Type2
11 2011-08    A  Type2     8       Type1       Type1
12 2011-08    A  Type2     8       Type1       Type2
13 2011-08    A  Type3    63       Type1       Type1
14 2011-08    A  Type3    63       Type1       Type2
15 2011-08    A  Type4     2       Type1       Type1
16 2011-08    A  Type4     2       Type1       Type2
17 2012-03    A  Type1    43        <NA>       Type1
18 2012-03    A  Type1    43        <NA>       Type2
19 2011-01    B  Type1     5       Type1       Type1
20 2011-01    B  Type2    89       Type1       Type1
21 2011-01    B  Type3     6       Type1       Type1
22 2011-01    B  Type4    25       Type1       Type1
23 2011-08    B  Type1    78 Type1 Type4       Type1
24 2011-08    B  Type2     2 Type1 Type4       Type1
25 2011-08    B  Type3    62 Type1 Type4       Type1
26 2011-08    B  Type4    46 Type1 Type4       Type1
27 2012-03    B  Type1     2        <NA>       Type1
28 2011-01    C  Type1     3       Type2       Type2
29 2011-01    C  Type2    24       Type2       Type2
30 2011-01    C  Type3    21       Type2       Type2
31 2011-01    C  Type4     8       Type2       Type2
32 2011-08    C  Type1     4       Type2       Type2
33 2011-08    C  Type2    13       Type2       Type2
34 2011-08    C  Type3    19       Type2       Type2
35 2011-08    C  Type4     3       Type2       Type2
36 2012-03    C  Type1     7        <NA>       Type2