从数据集创建对象的For循环

时间:2016-08-28 04:09:34

标签: r

所以我需要一个for循环来创建可用于时间序列分析的数据(Ref_Date~Value)。对于每个(值),其EST是类型,并且需要指定PRISEAS。但是,代码不会产生所需的结果。首先,每个对象的名称只有第一个字母,但是我希望转换来自"非营利机构服务于家庭'最终消费"到" NISHFC"。其次,观察结果不会添加到对象中。

csv数据

Ref_Date,GEO,PRI,SEAS,EST,Vector,Coordinate,Value
1981/03,Canada,Chained (2007) dollars,Seasonally adjusted at annual rates,"Final consumption expenditure (x 1,000,000)",v62305723,1.1.1.1,604670.000
1981/06,Canada,Chained (2007) dollars,Seasonally adjusted at annual rates,"Final consumption expenditure (x 1,000,000)",v62305723,1.1.1.1,603745.000
1981/09,Canada,Chained (2007) dollars,Seasonally adjusted at annual rates,"Final consumption expenditure (x 1,000,000)",v62305723,1.1.1.1,603415.000
1981/12,Canada,Chained (2007) dollars,Seasonally adjusted at annual rates,"Final consumption expenditure (x 1,000,000)",v62305723,1.1.1.1,604700.000
1982/03,Canada,Chained (2007) dollars,Seasonally adjusted at annual rates,"Final consumption expenditure (x 1,000,000)",v62305723,1.1.1.1,596566.000
1982/06,Canada,Chained (2007) dollars,Seasonally adjusted at annual rates,"Final consumption expenditure (x 1,000,000)",v62305723,1.1.1.1,594937.000
1982/09,Canada,Chained (2007) dollars,Seasonally adjusted at annual rates,"Final consumption expenditure (x 1,000,000)",v62305723,1.1.1.1,594907.000
1982/12,Canada,Chained (2007) dollars,Seasonally adjusted at annual rates,"Final consumption expenditure (x 1,000,000)",v62305723,1.1.1.1,593993.000
1983/03,Canada,Chained (2007) dollars,Seasonally adjusted at annual rates,"Final consumption expenditure (x 1,000,000)",v62305723,1.1.1.1,596617.000
1983/06,Canada,Chained (2007) dollars,Seasonally adjusted at annual rates,"Final consumption expenditure (x 1,000,000)",v62305723,1.1.1.1,604931.000
1983/09,Canada,Chained (2007) dollars,Seasonally adjusted at annual rates,"Final consumption expenditure (x 1,000,000)",v62305723,1.1.1.1,611881.000

数据显示在R

  Ref_Date    GEO                    PRI                                SEAS                                         EST    Vector Coordinate      Value
1  1981/03 Canada Chained (2007) dollars Seasonally adjusted at annual rates Final consumption expenditure (x 1,000,000) v62305723    1.1.1.1 604670.000
2  1981/06 Canada Chained (2007) dollars Seasonally adjusted at annual rates Final consumption expenditure (x 1,000,000) v62305723    1.1.1.1 603745.000
3  1981/09 Canada Chained (2007) dollars Seasonally adjusted at annual rates Final consumption expenditure (x 1,000,000) v62305723    1.1.1.1 603415.000
4  1981/12 Canada Chained (2007) dollars Seasonally adjusted at annual rates Final consumption expenditure (x 1,000,000) v62305723    1.1.1.1 604700.000
5  1982/03 Canada Chained (2007) dollars Seasonally adjusted at annual rates Final consumption expenditure (x 1,000,000) v62305723    1.1.1.1 596566.000
6  1982/06 Canada Chained (2007) dollars Seasonally adjusted at annual rates Final consumption expenditure (x 1,000,000) v62305723    1.1.1.1 594937.000

library(zoo);

require(ggplot2);
require(xts);
require(tseries);
require(timeDate);
require(forecast);

GDP = read.csv(
  "~/Desktop/GDP.csv"
  );

attach(GDP);

for (est in unique(EST)) {
  if (!grepl("(x 1,000,000)", est)) {
    string_list = strsplit(est, " ");
    name = "";
    for (string in string_list) {
      name = paste(substr(string,1,1), name, " ");
    }
    assign(toupper(name), GDP[which(EST==est & 
                             PRI=="Current prices" & 
                             SEAS=="Seasonally adjusted at annual rates"), 
                     c(1,8)]);
  }
}

警告:

Warning messages:
1: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
2: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
3: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
4: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
5: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
6: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
7: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
8: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
9: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
10: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
11: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
12: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
13: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
14: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
15: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
16: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
17: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
18: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
19: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
20: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
21: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
22: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
23: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
24: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
25: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
26: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name
27: In assign(toupper(name), GDP[which(EST == est & PRI ==  ... :
  only the first element is used as variable name

1 个答案:

答案 0 :(得分:1)

以问题中包含的格式解析数据实际上比回答问题更难。

library(dplyr)
library(readr)

# test for the question
text_so = "1  1981/03 Canada Chained (2007) dollars Seasonally adjusted at annual rates Final consumption expenditure (x 1,000,000) v62305723    1.1.1.1 604670.000
2  1981/06 Canada Chained (2007) dollars Seasonally adjusted at annual rates Final consumption expenditure (x 1,000,000) v62305723    1.1.1.1 603745.000
3  1981/09 Canada Chained (2007) dollars Seasonally adjusted at annual rates Final consumption expenditure (x 1,000,000) v62305723    1.1.1.1 603415.000
4  1981/12 Canada Chained (2007) dollars Seasonally adjusted at annual rates Final consumption expenditure (x 1,000,000) v62305723    1.1.1.1 604700.000
5  1982/03 Canada Chained (2007) dollars Seasonally adjusted at annual rates Final consumption expenditure (x 1,000,000) v62305723    1.1.1.1 596566.000
6  1982/06 Canada Chained (2007) dollars Seasonally adjusted at annual rates Final consumption expenditure (x 1,000,000) v62305723    1.1.1.1 594937.000"

# read in fixed width format file
df_foo = readr::read_fwf(
  text_so, 
  fwf_positions(
    start = c(1, 4, 12, 19, 42, 78, 122, 135, 143),
    end = c(2, 11, 18, 41, 77, 121, 134, 142, 152),
    col_names = c("Serial #", "Ref_Date", "GEO", "PRI", "SEAS", "EST", "Vector", "Coordinate", "Value")
  )
)

您的问题(据我所知)可以使用abbreviate轻松解决 - 您似乎希望在由ESTPRI组成的群组中唯一地缩写SEAS

# abbreviate EST uniquely within groups formed by PRI and SEAS 
df_foo %>% 
  group_by(PRI, SEAS) %>% 
  mutate(
    abbreviated_est = 
      toupper(
        abbreviate(gsub("\\(x 1,000,000\\)", "", EST),
                   use.classes = TRUE
        ) 
      )
  ) 

产生这个:

Source: local data frame [6 x 10]
Groups: PRI, SEAS [1]

  Serial # Ref_Date    GEO                    PRI                                SEAS                                         EST    Vector Coordinate  Value abbreviated_est
     <int>    <chr>  <chr>                  <chr>                               <chr>                                       <chr>     <chr>      <chr>  <dbl>           <chr>
1        1  1981/03 Canada Chained (2007) dollars Seasonally adjusted at annual rates Final consumption expenditure (x 1,000,000) v62305723    1.1.1.1 604670            FNCE
2        2  1981/06 Canada Chained (2007) dollars Seasonally adjusted at annual rates Final consumption expenditure (x 1,000,000) v62305723    1.1.1.1 603745            FNCE
3        3  1981/09 Canada Chained (2007) dollars Seasonally adjusted at annual rates Final consumption expenditure (x 1,000,000) v62305723    1.1.1.1 603415            FNCE
4        4  1981/12 Canada Chained (2007) dollars Seasonally adjusted at annual rates Final consumption expenditure (x 1,000,000) v62305723    1.1.1.1 604700            FNCE
5        5  1982/03 Canada Chained (2007) dollars Seasonally adjusted at annual rates Final consumption expenditure (x 1,000,000) v62305723    1.1.1.1 596566            FNCE
6        6  1982/06 Canada Chained (2007) dollars Seasonally adjusted at annual rates Final consumption expenditure (x 1,000,000) v62305723    1.1.1.1 594937            FNCE