如何从SAS数据集中选择所有行,该数据集与另一个SAS数据集中的至少一个值匹配

时间:2015-08-31 00:49:38

标签: sas

我试图从一个CSV文件中读取100个股票/ ETF的股票代码名称。我有两个CSV文件,一个包含90天内所有股票/ etfs的数据。第二个包含我有兴趣选择的100个库存/ etf代码的名称。下面是我的代码,WORK.ETFnames是一个列数据集,包含我想从fulldata中选择的100个ETF名称。如何使用此名称列表正确选择所需数据。在WORK.FULLdata中,名称存储在名为“Ticker”的列中。我已经按类型(ETF或Stock)对数据进行了排序,但是无法弄清楚如何从这些表中选择我真正感兴趣的行。谢谢!

PROC IMPORT OUT=WORK.Fulldata
   DATAFILE="/folders/myshortcuts/myfolder/q2_2012_all.csv"
   DBMS=CSV REPLACE;
   GETNAMES=YES;
   DATAROW=2;
RUN;

PROC IMPORT OUT = WORK.ETFnames
   DATAFILE = "/folders/myshortcuts/myfolder/ETFs.csv"
   DBMS=CSV REPLACE;
   GETNAMES=YES;
   DATAROW=2;
RUN;

PROC SQL;
   CREATE TABLE stocks AS
   SELECT *
   from Fulldata
   where Security EQ "Stock";
QUIT;

PROC SQL;
   CREATE TABLE ETF AS
   SELECT *
   from Fulldata
   where Security EQ "ETF" 
QUIT;

1 个答案:

答案 0 :(得分:0)

您可能想尝试合并两个数据集,并且只接受那些匹配" Ticker"值。我将假设数据集ETFnames的名称存储在变量" Ticker"太

  PROC IMPORT OUT= WORK.Fulldata
            DATAFILE= "/folders/myshortcuts/myfolder/q2_2012_all.csv"
            DBMS=CSV REPLACE;
     GETNAMES=YES;
     DATAROW=2;
   RUN;

   PROC IMPORT OUT= WORK.ETFnames
            DATAFILE= "/folders/myshortcuts/myfolder/ETFs.csv"
            DBMS=CSV REPLACE;
     GETNAMES=YES;
     DATAROW=2;
   RUN;

  PROC SORT DATA=WORK.Fulldata OUT=WORK.Fulldatasort;
        BY Ticker;
  RUN;

  PROC SORT DATA=WORK.EFTnames OUT=WORK.EFTnamessort;
        BY Ticker;
  RUN;

  DATA WORK.Partdata;
        MERGE WORK.Fulldatasort WORK.EFTnamessort(in=A);
        BY Ticker;
        IF A;
  RUN;

  PROC SQL;
  CREATE TABLE stocks AS
  SELECT *
  from Partdata
  Where Security EQ "Stock";
  QUIT;

  PROC SQL;
  CREATE TABLE ETF AS
  SELECT *
  from Partdata
  Where Security EQ "ETF" 
  QUIT;

据我所知,这会给你想要的结果。您也可以在PROC SQL语句中加入而不是MERGE,但MERGE更容易编写IMO。