如何根据客户ID进行随机抽样

时间:2018-09-26 06:26:26

标签: r random sampling

我的数据集如下

Cust_ID SubDepartment       Date        Bill_Value
1513    Detergent Bars      20180204    30
1513    French Fries        20180803    160
1513    Fruit Based Juices  20180204    30
1513    Containers          20180803    79
1513    Candies & Lollypops 20180803    165
1513    Cheese              20180204    203.04
1513    Cleaners & Wipers   20180803    159
1513    Chocolates          20180803    448
1513    Frozen Veg Snacks   20180803    570
1860    Salted Biscuits     20180110    40
1860    Oats                20180705    60
1860    Pastries            20180815    198
1860    Cheese              20180110    85
1860    Veg Powdered Soups  20180814    20
1860    Paneer              20180123    99
1860    North Indian Mixes  20180502    59
1860    Flavoured Milk      20180502    40

sample data set

我想根据客户ID对数据进行随机抽样,以便它能选择所选客户的所有交易

1 个答案:

答案 0 :(得分:4)

这是一个简单的解决方案,您首先要获取所有唯一的ID,对其进行采样并根据该示例对数据框进行子集化:

df[df$Cust_ID %in% sample(unique(df$Cust_ID), n),]

其中n是要采样的id的数量。

使用您的数据:

 set.seed(1) #to be reproducible  
 df[df$Cust_ID %in% sample(unique(df$Cust_ID), 1),]

输出

  Cust_ID       SubDepartment     Date Bill_Value
1    1513      Detergent.Bars 20180204      30.00
2    1513        French.Fries 20180803     160.00
3    1513  Fruit.Based.Juices 20180204      30.00
4    1513          Containers 20180803      79.00
5    1513 Candies.&.Lollypops 20180803     165.00
6    1513              Cheese 20180204     203.04
7    1513   Cleaners.&.Wipers 20180803     159.00
8    1513          Chocolates 20180803     448.00
9    1513   Frozen.Veg.Snacks 20180803     570.00

数据:

 df <- read.table(text = "Cust_ID SubDepartment       Date        Bill_Value
1513    Detergent.Bars      20180204    30
           1513    French.Fries        20180803    160
           1513    Fruit.Based.Juices  20180204    30
           1513    Containers          20180803    79
           1513    Candies.&.Lollypops 20180803    165
           1513    Cheese              20180204    203.04
           1513    Cleaners.&.Wipers   20180803    159
           1513    Chocolates          20180803    448
           1513    Frozen.Veg.Snacks   20180803    570
           1860    Salted.Biscuits     20180110    40
           1860    Oats                20180705    60
           1860    Pastries            20180815    198
           1860    Cheese              20180110    85
           1860    Veg.Powdered.Soups  20180814    20
           1860    Paneer              20180123    99
           1860    North.Indian.Mixes  20180502    59
           1860    Flavoured.Milk      20180502    40", header = TRUE)