我在SAS中有一张表格,上面有日期,公司名称和行业类别(1:49)。
是否有一些简单的代码可以计算每个日期每个行业的公司数量。
所以我需要计算行业类别。计算该行业类别在每个日期出现的次数。
答案 0 :(得分:1)
除了Proc freq外,您还可以使用First。最后。这个问题的概念。
action={[
<Button key="undo" color="secondary" size="small" onClick={this.handleClose}>
UNDO
</Button>,
<IconButton
key="close"
aria-label="Close"
color="inherit"
className={classes.close}
onClick={this.handleClose}
>
<CloseIcon />
</IconButton>,
]}
/>
`
答案 1 :(得分:0)
频率表列出了变量值的每种不同组合在数据集中出现的次数。每个组合也称为“ bin”。频率表中的bin数量可以称为“基数”,也可以称为不同值的数量。
有很多方法可以在SAS中生成频率表。
Proc FREQ是简单分组的常见起点。
但是,问题是这样
每个日期每个行业有多少家公司
所以这意味着获取子级别的基数计数。 SQL可以在单个查询中做到这一点:
**** simulate data begin;
data companies;
do companyId = 1 to 1000;
industryId = ceil(49*ranuni(123));
output;
end;
run;
data have;
format date yymmdd10.;
do date = '01-jan-2016'd to '31-dec-2018'd;
if weekday(date) in (1,7) then continue; * no activity on weekend;
do _n_ = 1 to 50; * upto 50 random 'events' of random companies;
if ranuni(123) < 0.60 then continue;
if ranuni(123) < 0.05 then leave;
eventId+1;
point = ceil(1000*ranuni(123));
set companies point=point;
output;
end;
end;
stop;
run;
**** simulate data end;
* number of companies within industry (way #1);
* use sub-select to compute the cardinality of company with respect to date/industry;
proc sql;
create table counts1 (label="Number of companies per date/industry") as
select
date
, industryId
, count (distinct companyId) as number_of_companies
from
(
select date, industryId, companyId, count(*) as number_of_company_events_on_date
from have
group by date, industryId, companyId
)
group by date, industryId
;
* number of companies within industry (way #2);
* use catx to construct the sub-level combination (bins) to be distinctly counted;
create table counts1B as
select
date
, industryId
, count (distinct catx(':',industryId,companyId)) as number_of_companies
group by date, industryId
;
* bonus: just number of industries (ignoring companies);
create table counts2 (label="Number of industries per date") as
select
date
, count (distinct industryId) as number_of_industries
from have
group by date
;
* bonus: disjoint counts of each category (company industry hierarchical relationship ignored);
create table counts3 (label="Counts for industry and company by date") as
select
date
, count (distinct industryId) as number_of_industries
, count (distinct companyId) as number_of_companies
from have
group by date
;
答案 2 :(得分:0)
PROC FREQ是获得答案的最简单方法。
proc freq data=have;
tables date*industry / list missing;
run;
这将是该行业在给定日期出现多少次的计数。如果每个日期,行业,公司组合只有一个观察值,那么它也是该日期该行业中公司数量的计数。