从SAS中的汽车类型仿真生成模型?

时间:2015-04-10 15:19:30

标签: model statistics sas simulation

我正在运行模拟来预测汽车经销商的库存。经销商10%的时间销售零汽车,50%的汽车销售30%,30%的汽车销售3辆,10%的汽车销售3辆汽车。在这些销售的汽车中,50%是轿车,30%是SUV,20%是卡车。此外,所售车型如下:

  • 轿车:30%是'S',40%是'SE',30%是'SEL'
  • SUV:25%是'S',35%是'SE',40%是'SEL'
  • 卡车:50%为'S',30%为'SE',20%为'SEL'

我正在尝试生成这些最后的模型发行版,但我的代码会打印所有日期的“S”。你们能提供任何建议吗?谢谢!

SAS code

以下是文字中的代码:

data cars (drop=perc_sales car_type_perc subtype_perc);
format date date9.;
    date='01Jan2016'd;
    do until (date > '31Dec2016'd);
        if 
            weekday(date) not in (1)
            and date not in ('01Jan2016'd,'04Jul2016'd,'25Dec2016'd)
            then output;
        date=intnx('day',date,1);
    S_price1 = 15000+200*rand('normal');    *Sedan;
    S_price2 = 30000+500*rand('normal');    *SUV;
    S_price3 = 25000+300*rand('normal');    *Truck;
    perc_sales = rand('uniform');
    car_type_perc = rand('uniform');
    subtype_perc = rand('uniform');
        if 0 < perc_sales <= 0.1 then Ncars = 0;                * 10% chance zero cars sold;
        if 0.1 < perc_sales <= 0.6 then Ncars = 1;              * 50% chance one car sold;
        if 0.6 < perc_sales <= 0.9 then Ncars = 2;              * 30% chance two cars sold;
        if 0.9 < perc_sales <= 1 then Ncars = 3;                * 10% chance three cars sold;
        if 0 < car_type_perc <= 0.5 then type = 'Sedan';        * 50% of cars sold are sedans;
        if 0.5 < car_type_perc <= 0.8 then type = 'SUV';        * 30% of cars sold are SUVs;
        if 0.8 < car_type_perc <= 1 then type = 'Truck';        * 20% of cars sold are trucks;
        if type = 'Sedan' and 0 < rand('uniform') <= 0.3 then model = 'S';
        if type = 'Sedan' and 0.3 < rand('uniform') <= 0.7 then model = 'SE';
        if type = 'Sedan' and 0.7 < rand('uniform') <= 1 then model = 'SEL';
        if type = 'SUV' and 0 < rand('uniform') <= 0.25 then model = 'S';
        if type = 'SUV' and 0.25 < rand('uniform') <= 0.6 then model = 'SE';
        if type = 'SUV' and 0.6 < rand('uniform') <= 1 then model = 'SEL';
        if type = 'Truck' and 0 < rand('uniform') <= 0.5 then model = 'S';
        if type = 'Truck' and 0.5 < rand('uniform') <= 0.8 then model = 'SE';
        if type = 'Truck' and 0.8 < rand('uniform') <= 1 then model = 'SEL';
    end;
run;

1 个答案:

答案 0 :(得分:0)

首先,您的变量model的长度为$1,因为第一次看到它,'S'。始终定义变量的长度。

其次,您需要将rand('uniform')分配给type计算的变量。上面的代码有时候不应该分配模型,因为每个调用都是一个新的随机数。你为car_type_perc这样做了,但不适用于类型。有时候没有分配类型;你在一个数据步骤循环迭代中这样做,所以你可能不会注意到这一点,有时除了初始行,但它目前没有做你想做的事。

第三,为什么你的输出位于循环的顶部?因为你排除1月1日,它并没有真正破坏任何东西,但你在计算任何东西之前输出;这样做有点奇怪。通常你要在if块中添加continue;语句,并在循环结束时输出所有其他情况。

第四,你应该给call streaminit(#)添加一个电话(你想要的任何正数),这样你的样本就可以复制了。

最后,我认为有更好的方法可以做到这一点(虽然上面并不是特别糟糕,只是有点难以维护,很难总是正确)。创建一个“源”数据集及其基础知识,例如,1000辆汽车非随机分布但与上述相匹配(基本上是3种百分比的每种可能组合 - 你可以少一点,但1000很容易)。然后使用with-replacement方法的PROC SURVEYSELECT将生成您想要的任何大小的样本。 Rick Wicklin有一本书,用SAS模拟数据,以及他blog, The DO Loop上的一些很棒的帖子,可能对此很有帮助。

这是一些小的改动,使这项工作正常,并使其更简单。我会这样做有点不同,因为很长的IF语句难以阅读和维护,但如果这对你有用,那就是你的选择。

除上述之外的重大变化:我将循环更改为DO循环,因为日期是整数,并且更容易阅读;添加STOP因为技术上必要(SAS会解决它,但你应该有它);将OUTPUT移到底部,并将CONTINUE添加到顶部。在输出之后还添加了一个CALL MISSING,以确保在下一行中没有任何细节。

data cars (drop=perc_sales car_type_perc subtype_perc);
    format date date9.;  
    length model $3 type $5;
    call streaminit(7);
    do date = '01Jan2016'd to '31Dec2016'd;
        if 
            weekday(date) = 1 or date in ('01Jan2016'd,'04Jul2016'd,'25Dec2016'd)
            then continue;
    S_price1 = 15000+200*rand('normal');    *Sedan;
    S_price2 = 30000+500*rand('normal');    *SUV;
    S_price3 = 25000+300*rand('normal');    *Truck;
    perc_sales = rand('uniform');
    car_type_perc = rand('uniform');
    subtype_perc = rand('uniform');
        if 0 < perc_sales <= 0.1 then Ncars = 0;                * 10% chance zero cars sold;
        if 0.1 < perc_sales <= 0.6 then Ncars = 1;              * 50% chance one car sold;
        if 0.6 < perc_sales <= 0.9 then Ncars = 2;              * 30% chance two cars sold;
        if 0.9 < perc_sales <= 1 then Ncars = 3;                * 10% chance three cars sold;
        if 0 < car_type_perc <= 0.5 then type = 'Sedan';        * 50% of cars sold are sedans;
        if 0.5 < car_type_perc <= 0.8 then type = 'SUV';        * 30% of cars sold are SUVs;
        if 0.8 < car_type_perc <= 1 then type = 'Truck';        * 20% of cars sold are trucks;
        if type = 'Sedan' and 0 < subtype_perc <= 0.3 then model = 'S';
        if type = 'Sedan' and 0.3 < subtype_perc <= 0.7 then model = 'SE';
        if type = 'Sedan' and 0.7 < subtype_perc <= 1 then model = 'SEL';
        if type = 'SUV' and 0 < subtype_perc <= 0.25 then model = 'S';
        if type = 'SUV' and 0.25 < subtype_perc <= 0.6 then model = 'SE';
        if type = 'SUV' and 0.6 < subtype_perc <= 1 then model = 'SEL';
        if type = 'Truck' and 0 < subtype_perc <= 0.5 then model = 'S';
        if type = 'Truck' and 0.5 < subtype_perc <= 0.8 then model = 'SE';
        if type = 'Truck' and 0.8 < subtype_perc <= 1 then model = 'SEL';
        output;
        call missing(of model type ncars);
    end;
    stop;
run;