我的数据集如下
Cust_ID SubDepartment Date Bill_Value 1513 Detergent Bars 20180204 30 1513 French Fries 20180803 160 1513 Fruit Based Juices 20180204 30 1513 Containers 20180803 79 1513 Candies & Lollypops 20180803 165 1513 Cheese 20180204 203.04 1513 Cleaners & Wipers 20180803 159 1513 Chocolates 20180803 448 1513 Frozen Veg Snacks 20180803 570 1860 Salted Biscuits 20180110 40 1860 Oats 20180705 60 1860 Pastries 20180815 198 1860 Cheese 20180110 85 1860 Veg Powdered Soups 20180814 20 1860 Paneer 20180123 99 1860 North Indian Mixes 20180502 59 1860 Flavoured Milk 20180502 40
我想根据客户ID对数据进行随机抽样,以便它能选择所选客户的所有交易
答案 0 :(得分:4)
这是一个简单的解决方案,您首先要获取所有唯一的ID,对其进行采样并根据该示例对数据框进行子集化:
df[df$Cust_ID %in% sample(unique(df$Cust_ID), n),]
其中n
是要采样的id
的数量。
使用您的数据:
set.seed(1) #to be reproducible
df[df$Cust_ID %in% sample(unique(df$Cust_ID), 1),]
Cust_ID SubDepartment Date Bill_Value
1 1513 Detergent.Bars 20180204 30.00
2 1513 French.Fries 20180803 160.00
3 1513 Fruit.Based.Juices 20180204 30.00
4 1513 Containers 20180803 79.00
5 1513 Candies.&.Lollypops 20180803 165.00
6 1513 Cheese 20180204 203.04
7 1513 Cleaners.&.Wipers 20180803 159.00
8 1513 Chocolates 20180803 448.00
9 1513 Frozen.Veg.Snacks 20180803 570.00
数据:
df <- read.table(text = "Cust_ID SubDepartment Date Bill_Value
1513 Detergent.Bars 20180204 30
1513 French.Fries 20180803 160
1513 Fruit.Based.Juices 20180204 30
1513 Containers 20180803 79
1513 Candies.&.Lollypops 20180803 165
1513 Cheese 20180204 203.04
1513 Cleaners.&.Wipers 20180803 159
1513 Chocolates 20180803 448
1513 Frozen.Veg.Snacks 20180803 570
1860 Salted.Biscuits 20180110 40
1860 Oats 20180705 60
1860 Pastries 20180815 198
1860 Cheese 20180110 85
1860 Veg.Powdered.Soups 20180814 20
1860 Paneer 20180123 99
1860 North.Indian.Mixes 20180502 59
1860 Flavoured.Milk 20180502 40", header = TRUE)