我正在尝试清理我从网上下载的一些数据转换为XTS。我在使用GREPL清理数据时发现了一些关于CRAN的文档,但我想知道除了使用GREPL之外是否有更简单的方法来做到这一点。我希望有人能够帮助我使用GREPL或R中的其他功能来清理这些数据。提前感谢您提供的任何帮助。
[1] "{"
[2] " \"Meta Data\": {"
[3] " \"1. Information\": \"Daily Prices (open, high, low, close) and Volumes\","
[4] " \"2. Symbol\": \"MSFT\","
[5] " \"3. Last Refreshed\": \"2017-06-08 15:15:00\","
[6] " \"4. Output Size\": \"Compact\","
[7] " \"5. Time Zone\": \"US/Eastern\""
[8] " },"
[9] " \"2017-01-19\": {"
[10] " \"1. open\": \"62.2400\","
[11] " \"2. high\": \"62.9800\","
[12] " \"3. low\": \"62.1950\","
[13] " \"4. close\": \"62.3000\","
[14] " \"5. volume\": \"18451655\""
[15] " },"
[16] " \"2017-01-18\": {"
[17] " \"1. open\": \"62.6700\","
[18] " \"2. high\": \"62.7000\","
[19] " \"3. low\": \"62.1200\","
[20] " \"4. close\": \"62.5000\","
[21] " \"5. volume\": \"19670102\""
[22] " },"
[23] " \"2017-01-17\": {"
[24] " \"1. open\": \"62.6800\","
[25] " \"2. high\": \"62.7000\","
[26] " \"3. low\": \"62.0300\","
[27] " \"4. close\": \"62.5300\","
[28] " \"5. volume\": \"20663983\""
[29] " }"
[30] " }"
[31] "}"
此数据的最终输出如下:
Open High Low Close Volume
2017-01-17 62.68 62.70 62.03 62.53 20663983
2017-01-18 62.67 62.70 62.12 62.50 19670102
2017-01-19 62.24 62.98 62.195 62.30 18451655
答案 0 :(得分:0)
作为beigel suggested,您需要做的第一件事是解析JSON。
Lines <-
"{
\"Meta Data\": {
\"1. Information\": \"Daily Prices (open, high, low, close) and Volumes\",
\"2. Symbol\": \"MSFT\",
\"3. Last Refreshed\": \"2017-06-08 15:15:00\",
\"4. Output Size\": \"Compact\",
\"5. Time Zone\": \"US/Eastern\"
},
\"2017-01-19\": {
\"1. open\": \"62.2400\",
\"2. high\": \"62.9800\",
\"3. low\": \"62.1950\",
\"4. close\": \"62.3000\",
\"5. volume\": \"18451655\"
},
\"2017-01-18\": {
\"1. open\": \"62.6700\",
\"2. high\": \"62.7000\",
\"3. low\": \"62.1200\",
\"4. close\": \"62.5000\",
\"5. volume\": \"19670102\"
},
\"2017-01-17\": {
\"1. open\": \"62.6800\",
\"2. high\": \"62.7000\",
\"3. low\": \"62.0300\",
\"4. close\": \"62.5300\",
\"5. volume\": \"20663983\"
}
}"
parsedLines <- jsonlite::fromJSON(Lines)
现在数据处于可用的结构中,我们可以开始清理它。请注意parsedLines
中的每个元素都是另一个列表。我们将它们转换为unlist
的向量,因此我们将有一个向量列表而不是列表列表。
parsedLines <- lapply(parsedLines, unlist)
现在您可能已经注意到parsedLines
中的第一个元素是元数据。我们可以在以后将它附加到最终对象。但首先,让所有其他元素rbind
成为一个矩阵。我们可以使用do.call
为任何长度列表执行此操作。
ohlcv <- do.call(rbind, parsedLines[-1]) # [-1] removes the first element
现在我们可以清理列名并将数据从字符转换为数字。
colnames(ohlcv) <- gsub("^[[:digit:]]\\.", "", colnames(ohlcv))
ohlcv <- type.convert(ohlcv)
此时,我会亲自转换为xts对象并附加元数据。但您可以继续使用ohlcv
矩阵,将其转换为data.frame,tibble等。
# convert to xts
x <- as.xts(ohlcv, dateFormat = "Date")
# attach attributes
metadata <- parsedLines[[1]]
names(metadata) <- gsub("[[:digit:]]|\\.|[[:space:]]", "", names(metadata))
xtsAttributes(x) <- metadata
# view attributes
str(x)
An 'xts' object on 2017-01-17/2017-01-19 containing:
Data: num [1:3, 1:5] 62.7 62.7 62.2 62.7 62.7 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] " open" " high" " low" " close" ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
List of 5
$ Information : chr "Daily Prices (open, high, low, close) and Volumes"
$ Symbol : chr "MSFT"
$ LastRefreshed: chr "2017-06-08 15:15:00"
$ OutputSize : chr "Compact"
$ TimeZone : chr "US/Eastern"