清理R中的数据并转换为XTS的最佳方法

时间:2017-06-08 20:00:03

标签: r xts grepl

我正在尝试清理我从网上下载的一些数据转换为XTS。我在使用GREPL清理数据时发现了一些关于CRAN的文档,但我想知道除了使用GREPL之外是否有更简单的方法来做到这一点。我希望有人能够帮助我使用GREPL或R中的其他功能来清理这些数据。提前感谢您提供的任何帮助。

  [1] "{"                                                                                 
  [2] "    \"Meta Data\": {"                                                              
  [3] "        \"1. Information\": \"Daily Prices (open, high, low, close) and Volumes\","
  [4] "        \"2. Symbol\": \"MSFT\","                                                  
  [5] "        \"3. Last Refreshed\": \"2017-06-08 15:15:00\","                           
  [6] "        \"4. Output Size\": \"Compact\","                                          
  [7] "        \"5. Time Zone\": \"US/Eastern\""     
  [8] "        },"                                                                        
  [9] "        \"2017-01-19\": {"                                                         
 [10] "            \"1. open\": \"62.2400\","                                             
 [11] "            \"2. high\": \"62.9800\","                                             
 [12] "            \"3. low\": \"62.1950\","                                              
 [13] "            \"4. close\": \"62.3000\","                                            
 [14] "            \"5. volume\": \"18451655\""                                           
 [15] "        },"                                                                        
 [16] "        \"2017-01-18\": {"                                                         
 [17] "            \"1. open\": \"62.6700\","                                             
 [18] "            \"2. high\": \"62.7000\","                                             
 [19] "            \"3. low\": \"62.1200\","                                              
 [20] "            \"4. close\": \"62.5000\","                                            
 [21] "            \"5. volume\": \"19670102\""                                           
 [22] "        },"                                                                        
 [23] "        \"2017-01-17\": {"                                                         
 [24] "            \"1. open\": \"62.6800\","                                             
 [25] "            \"2. high\": \"62.7000\","                                             
 [26] "            \"3. low\": \"62.0300\","                                              
 [27] "            \"4. close\": \"62.5300\","                                            
 [28] "            \"5. volume\": \"20663983\""                                           
 [29] "        }"                                                                         
 [30] "    }"                                                                             
 [31] "}"                                  

此数据的最终输出如下:

            Open        High        Low        Close        Volume
2017-01-17  62.68       62.70       62.03       62.53       20663983
2017-01-18  62.67       62.70       62.12       62.50       19670102
2017-01-19  62.24       62.98       62.195      62.30       18451655

1 个答案:

答案 0 :(得分:0)

作为beigel suggested,您需要做的第一件事是解析JSON。

Lines <-
"{                                                                                 
  \"Meta Data\": {
    \"1. Information\": \"Daily Prices (open, high, low, close) and Volumes\",
    \"2. Symbol\": \"MSFT\",
    \"3. Last Refreshed\": \"2017-06-08 15:15:00\",
    \"4. Output Size\": \"Compact\",
    \"5. Time Zone\": \"US/Eastern\"
  },
  \"2017-01-19\": {
      \"1. open\": \"62.2400\",
      \"2. high\": \"62.9800\",
      \"3. low\": \"62.1950\",
      \"4. close\": \"62.3000\",
      \"5. volume\": \"18451655\"
  },
  \"2017-01-18\": {
      \"1. open\": \"62.6700\",
      \"2. high\": \"62.7000\",
      \"3. low\": \"62.1200\",
      \"4. close\": \"62.5000\",
      \"5. volume\": \"19670102\"
  },
  \"2017-01-17\": {
      \"1. open\": \"62.6800\",
      \"2. high\": \"62.7000\",
      \"3. low\": \"62.0300\",
      \"4. close\": \"62.5300\",
      \"5. volume\": \"20663983\"
  }
}"
parsedLines <- jsonlite::fromJSON(Lines)

现在数据处于可用的结构中,我们可以开始清理它。请注意parsedLines中的每个元素都是另一个列表。我们将它们转换为unlist的向量,因此我们将有一个向量列表而不是列表列表。

parsedLines <- lapply(parsedLines, unlist)

现在您可能已经注意到parsedLines中的第一个元素是元数据。我们可以在以后将它附加到最终对象。但首先,让所有其他元素rbind成为一个矩阵。我们可以使用do.call为任何长度列表执行此操作。

 ohlcv <- do.call(rbind, parsedLines[-1])  # [-1] removes the first element

现在我们可以清理列名并将数据从字符转换为数字。

colnames(ohlcv) <- gsub("^[[:digit:]]\\.", "", colnames(ohlcv))
ohlcv <- type.convert(ohlcv)

此时,我会亲自转换为xts对象并附加元数据。但您可以继续使用ohlcv矩阵,将其转换为data.frame,tibble等。

# convert to xts
x <- as.xts(ohlcv, dateFormat = "Date")
# attach attributes
metadata <- parsedLines[[1]]
names(metadata) <- gsub("[[:digit:]]|\\.|[[:space:]]", "", names(metadata))
xtsAttributes(x) <- metadata
# view attributes
str(x)

An 'xts' object on 2017-01-17/2017-01-19 containing:
  Data: num [1:3, 1:5] 62.7 62.7 62.2 62.7 62.7 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:5] " open" " high" " low" " close" ...
  Indexed by objects of class: [Date] TZ: UTC
  xts Attributes:  
List of 5
 $ Information  : chr "Daily Prices (open, high, low, close) and Volumes"
 $ Symbol       : chr "MSFT"
 $ LastRefreshed: chr "2017-06-08 15:15:00"
 $ OutputSize   : chr "Compact"
 $ TimeZone     : chr "US/Eastern"