我有两个数据帧可以使用以下代码重现:
df=data.frame(xcode=c("612","920","924","925"),
ratio.company1=c("0.1","0.9","0.4","0"),
ratio.company2=c("0.1","0","0.6","0.6"),
ratio.company3=c("0.8","0.1","0","0.4"))
df
df2=data.frame(id=c("101","101","101","101","101","101","102","102","102","102","102","103","103","104","104","104","104","104","104","104","104","105","105","105","106","106","106","106","106","106","107","107","107","107","107","107"),
xcode=c("612","612","612","612","612","612","612","612","612","612","612","920","920","920","920","920","920","920","920","920","920","924","924","924","924","924","924","924","924","924","925","925","925","925","925","925"),
company=c(""))
df2
df给出了根据xcode字段将人员分配到company1或company 2或Company 3的概率。 df2给我ID和xcodes。根据xcodes给出的比率,df2中的ID需要分为公司1,2,3。
例如,在xcode 612的11个ID中,10 pct被分配到公司1,10 pct被分配给公司2,80 pct被分配给company 3。我想将结果舍入到0位小数。我无法想到实现这一目标的方法。我可以使用runif
命令来执行此操作吗?请帮忙。
我的结果数据集如下所示:
df2=data.frame(id=c("101","101","101","101","101","101","102","102","102","102","102","103","103","104","104","104","104","104","104","104","104","105","105","105","106","106","106","106","106","106","107","107","107","107","107","107"),
xcode=c("612","612","612","612","612","612","612","612","612","612","612","920","920","920","920","920","920","920","920","920","920","924","924","924","924","924","924","924","924","924","925","925","925","925","925","925"),
company=c("company1","company2","company3","company3","company3","company3","company3","company3","company3","company3","company3",
"company1","company1","company1","company1","company1","company1","company1","company1","company1","company3",
"company1","company1","company1","company1","company2","company2","company2","company2","company2",
"company2","company2","company2","company2","company3","company3"))
答案 0 :(得分:0)
这将对您的请求提供一种可能的解释:
c('comp1','comp2','comp3')[
findInterval( runif(36) ,
c(0, cumsum( as.numeric(as.character(df[1,2:4]))) ))]
#-----------
[1] "comp3" "comp3" "comp3" "comp3" "comp2" "comp3" "comp3" "comp3" "comp3"
[10] "comp3" "comp3" "comp3" "comp2" "comp3" "comp1" "comp3" "comp1" "comp3"
[19] "comp1" "comp3" "comp3" "comp3" "comp3" "comp2" "comp3" "comp3" "comp1"
[28] "comp3" "comp3" "comp3" "comp3" "comp3" "comp3" "comp2" "comp3" "comp3"
我过去回答类似问题的经验是,通常会有一个不言而喻的期望,即0.1,0.1和0.8的比例,这并不符合这种期望。如果您希望以这些比例准确地(或几乎完全地,因为36%的10%不是整数),则需要使用rdirichlet
而不是runif
。或者,您可以在sample
的向量上使用c(rep('comp1', 3), rep('comp2', 4), rep('comp3', 29))
。