grep记录多个条件

时间:2014-02-25 17:52:57

标签: r grep criteria

我有一系列供应商和结算金额,结算金额分组到存储桶中。

我想将数据集子集化为只有两个'<< 100'水桶,以及'500-1000'水桶或'> 1000'水桶。样本数据:

df <- structure(list(GrossAmt = c(74.37, 69.69, 705.76, 694.12, 5243, 
2680.95, 23270, 64.31, 64.31, 64.31, 1863.6, 4030.38, 43.86, 
36.57, 37.29, 31.02, 59.43, 27.65), VenName = structure(c(3L, 
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 1L, 1L, 
1L), .Label = c("Labcorp", "Quest Diagnostics Incorporated", 
"THOMAS JEFFERSON UNIV HOSPITAL", "WASHINGTON HOSPITAL CENTER"
), class = "factor"), AmtGrp = structure(c(1L, 1L, 3L, 3L, 2L, 
2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("<= 100", 
"> 1000", "500 - 1000"), class = "factor")), .Names = c("GrossAmt", 
"VenName", "AmtGrp"), class = "data.frame", row.names = c(NA, 
-18L))

在我的例子中,得到的数据集将包含来自TJU Hospital&amp;华盛顿医院中心,因为他们都有账单&lt; 100美元&amp;在其中一个较高的桶中。其他提供商将被过滤掉b / c他们没有账单&gt; $ 500

我会提供我到目前为止所做的工作,但老实说不知道从哪里开始,所以请原谅我。我的第一直觉是我需要根据分组标准为记录设置grep命令,但我不知道如何根据供应商的名称进行匹配。

编辑 - 扩展问题:

如果任何供应商属于多个amt组,我该如何过滤,无论具体数量组是什么?

2 个答案:

答案 0 :(得分:2)

library(dplyr)

chain(df, group_by(VenName), 
          filter(any(AmtGrp == '<= 100'),
                 !all(AmtGrp == '<= 100')))

编辑:第二个问题

chain(df, group_by(VenName), 
          filter(length(unique(AmtGrp)) > 1))

答案 1 :(得分:2)

以下是基本函数avesubset

的解决方案
subset(df, as.logical(ave(as.character(AmtGrp), VenName, FUN = function(x) 
  any(x == "<= 100") & any(x %in% c("500 - 1000", "> 1000")))))

   GrossAmt                        VenName     AmtGrp
1     74.37 THOMAS JEFFERSON UNIV HOSPITAL     <= 100
2     69.69 THOMAS JEFFERSON UNIV HOSPITAL     <= 100
3    705.76 THOMAS JEFFERSON UNIV HOSPITAL 500 - 1000
4    694.12 THOMAS JEFFERSON UNIV HOSPITAL 500 - 1000
5   5243.00 THOMAS JEFFERSON UNIV HOSPITAL     > 1000
6   2680.95 THOMAS JEFFERSON UNIV HOSPITAL     > 1000
7  23270.00 THOMAS JEFFERSON UNIV HOSPITAL     > 1000
8     64.31     WASHINGTON HOSPITAL CENTER     <= 100
9     64.31     WASHINGTON HOSPITAL CENTER     <= 100
10    64.31     WASHINGTON HOSPITAL CENTER     <= 100
11  1863.60     WASHINGTON HOSPITAL CENTER     > 1000
12  4030.38     WASHINGTON HOSPITAL CENTER     > 1000