Question

我的数据集如下所示。此数据集包含四个变量国家/地区名称Country，公司标识Company，Year和Date。

Country  Company  Year  Date 
-------  -------  ----  ----
A        1        2000     2000/01/02
A        1        2001     2001/01/03
A        1        2001     2001/07/02
A        1        2000     2001/08/03
B        2        2000     2001/08/03
C        3        2000     2001/08/03

我知道如何计算每个国家不同公司的数量。我使用以下代码完成了它。

proc sql;
   create table lib.count as
   select country, count(distinct company) as count 
   from lib.data
   group by country;
quit;

我的问题是如何计算每个国家distinct公司年数。基本上我想知道不同年份有多少不同的公司或同一家公司。如果在同一年对同一家公司进行两次观察，我想将其视为1个不同的值。如果同一家公司在不同的年份有两个观察我想把它算作两个不同的价值。我希望输出看起来如下（每个国家一个数字）：

Country  No. firm_year
A        2
B        1
C        1

任何人都可以教我如何做到这一点。

Answer 1

快速方法是连接您要比较的所有变量，创建一个新变量。类似的东西：

data data_mod;
    set data;
    length company_year $ 20;
    company_year= cats(company,year);
run;

然后，您可以使用proc sql运行count(distinct company_year)。

Answer 2

您需要嵌套查询，如@DaBigNikoladze暗示......

＆＃34;内部＆＃34;查询将生成Country + Company + Year;
＆＃34;外部＆＃34;查询将计算内部查询中每个国家/地区的行数。

生成数据集

data have;
  informat Country $1.
           Company 1.
           Year 4.
           Date YYMMDD10.;
  format Date YYMMDDs10.;
  input country company year date;
  datalines;
A 1 2000 2000/01/02
A 1 2001 2001/01/03
A 1 2001 2001/07/02
A 1 2000 2001/08/03
B 2 2000 2001/08/03
C 3 2000 2001/08/03
;

执行查询

PROC SQL;
  CREATE TABLE want AS 
    SELECT country, Count(company) AS Firm_year 
      FROM (SELECT DISTINCT country, company, year FROM have) 
    GROUP  BY country; 
QUIT;

结果

Country Firm_year 
A       2 
B       1 
C       1

Answer 3

 proc sort data=lib.data out=temp nodupkey; 
   by country company year; 
 run;


 data firm_year(keep=country cnt_fyr);
   set out;
   by country company year
   retain cnt_fyr;
   if first.country then cnt_fyr=1;
   else cnt_fyr+1;
   if last.country;
 run;

Answer 4

第一个问题的答案是：

data lib.count(keep=country companyCount);
   set lib.data;
   by country;
   retain companyList '';
   retain companyCount 0;
   if first.country then do;
      companyList = company;
      companyCount = 1;
   end;
   else do;
      if ^index(companyList, company) then do;
         companyList = cats(companyList,',',company);
         companyCount + 1;
      end;
   end;
   if last.country then output;
run;

结果是：

Country  companyCount
-------  ------------
A        2
B        1
C        1

类似的，你将采用每个国家不同的公司年数。

Answer 5

猜猜我对你期望结果看起来有点困惑。这是一个sql方法，它获得的结果与目前为止的其他答案相同。

data temp;
    attrib Country length = $10;
    attrib Company length = $10;
    attrib Year length = $10;
    attrib Date length = $10;
    input Country $ Company $ Year $ Date $;
    infile datalines delimiter = '@';
    datalines;
    A@1@x@x1@
    A@1@x@x2@
    B@2@x@x1@
    C@3@x@x3@
    ;
run;


proc sql;
   create table temp2  as
   select country,  count(distinct Date) as count 
   from temp
   group by country,  company;
quit;

如何使用SAS计算二维上的不同值

5 个答案:

生成数据集

执行查询

结果