我有一系列供应商和结算金额,结算金额分组到存储桶中。
我想将数据集子集化为只有两个'<< 100'水桶,以及'500-1000'水桶或'> 1000'水桶。样本数据:
df <- structure(list(GrossAmt = c(74.37, 69.69, 705.76, 694.12, 5243,
2680.95, 23270, 64.31, 64.31, 64.31, 1863.6, 4030.38, 43.86,
36.57, 37.29, 31.02, 59.43, 27.65), VenName = structure(c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 1L, 1L,
1L), .Label = c("Labcorp", "Quest Diagnostics Incorporated",
"THOMAS JEFFERSON UNIV HOSPITAL", "WASHINGTON HOSPITAL CENTER"
), class = "factor"), AmtGrp = structure(c(1L, 1L, 3L, 3L, 2L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("<= 100",
"> 1000", "500 - 1000"), class = "factor")), .Names = c("GrossAmt",
"VenName", "AmtGrp"), class = "data.frame", row.names = c(NA,
-18L))
在我的例子中,得到的数据集将包含来自TJU Hospital&amp;华盛顿医院中心,因为他们都有账单&lt; 100美元&amp;在其中一个较高的桶中。其他提供商将被过滤掉b / c他们没有账单&gt; $ 500
我会提供我到目前为止所做的工作,但老实说不知道从哪里开始,所以请原谅我。我的第一直觉是我需要根据分组标准为记录设置grep命令,但我不知道如何根据供应商的名称进行匹配。
编辑 - 扩展问题:
如果任何供应商属于多个amt组,我该如何过滤,无论具体数量组是什么?
答案 0 :(得分:2)
library(dplyr)
chain(df, group_by(VenName),
filter(any(AmtGrp == '<= 100'),
!all(AmtGrp == '<= 100')))
编辑:第二个问题
chain(df, group_by(VenName),
filter(length(unique(AmtGrp)) > 1))
答案 1 :(得分:2)
以下是基本函数ave
和subset
:
subset(df, as.logical(ave(as.character(AmtGrp), VenName, FUN = function(x)
any(x == "<= 100") & any(x %in% c("500 - 1000", "> 1000")))))
GrossAmt VenName AmtGrp
1 74.37 THOMAS JEFFERSON UNIV HOSPITAL <= 100
2 69.69 THOMAS JEFFERSON UNIV HOSPITAL <= 100
3 705.76 THOMAS JEFFERSON UNIV HOSPITAL 500 - 1000
4 694.12 THOMAS JEFFERSON UNIV HOSPITAL 500 - 1000
5 5243.00 THOMAS JEFFERSON UNIV HOSPITAL > 1000
6 2680.95 THOMAS JEFFERSON UNIV HOSPITAL > 1000
7 23270.00 THOMAS JEFFERSON UNIV HOSPITAL > 1000
8 64.31 WASHINGTON HOSPITAL CENTER <= 100
9 64.31 WASHINGTON HOSPITAL CENTER <= 100
10 64.31 WASHINGTON HOSPITAL CENTER <= 100
11 1863.60 WASHINGTON HOSPITAL CENTER > 1000
12 4030.38 WASHINGTON HOSPITAL CENTER > 1000