我使用rjson
导入了一个json文件,并将其转换为data.frame,但所有数据都是横向扩展的,列名包含密钥信息。
stations <- fromJSON(file = "station_information.json")
test <- as.data.frame(stations[3])
这看起来像是:
> dim(test)
[1] 2 5985
> test[1:27]
data.stations.station_id data.stations.name data.stations.short_name
1 72 W 52 St & 11 Ave 6926.01
2 72 W 52 St & 11 Ave 6926.01
data.stations.lat data.stations.lon data.stations.region_id
1 40.76727 -73.99393 71
2 40.76727 -73.99393 71
data.stations.rental_methods data.stations.capacity
1 KEY 39
2 CREDITCARD 39
data.stations.eightd_has_key_dispenser data.stations.station_id.1
1 FALSE 79
2 FALSE 79
data.stations.name.1 data.stations.short_name.1 data.stations.lat.1
1 Franklin St & W Broadway 5430.08 40.71912
2 Franklin St & W Broadway 5430.08 40.71912
data.stations.lon.1 data.stations.region_id.1 data.stations.rental_methods.1
1 -74.00667 71 KEY
2 -74.00667 71 CREDITCARD
data.stations.capacity.1 data.stations.eightd_has_key_dispenser.1
1 33 FALSE
2 33 FALSE
data.stations.station_id.2 data.stations.name.2 data.stations.short_name.2
1 82 St James Pl & Pearl St 5167.06
2 82 St James Pl & Pearl St 5167.06
data.stations.lat.2 data.stations.lon.2 data.stations.region_id.2
1 40.71117 -74.00017 71
2 40.71117 -74.00017 71
data.stations.rental_methods.2 data.stations.capacity.2
1 KEY 27
2 CREDITCARD 27
data.stations.eightd_has_key_dispenser.2
1 FALSE
2 FALSE
正如您所看到的,使用简单的转置t()
或melt()
解决方案无法解决此问题。我想知道我在导入或转换为data.frame时出错了什么,这使我得到了一个数据框,该数据框的索引应该是附加到列名称的行。
我已尝试过这两种方法,但我留下了相同的拉伸数据:
plyr::ldply(stations, data.frame)
do.call(rbind, lapply(stations, data.frame, stringsAsFactors=FALSE))
最后,我希望我的输出看起来像每9列都是&#34; cut&#34;并堆叠到前9个 - 这样我就剩下 655行和9列任何建议都将受到赞赏。
注意:我直接从这个link获取JSON(它不是一个大文件)
以下是前27列的可重现示例,应将其重新整形为9 x 3数据帧:
> dput(df)
structure(list(data.stations.station_id = structure(c(1L, 1L), class = "factor", .Label = "72"),
data.stations.name = structure(c(1L, 1L), class = "factor", .Label = "W 52 St & 11 Ave"),
data.stations.short_name = structure(c(1L, 1L), class = "factor", .Label = "6926.01"),
data.stations.lat = c(40.76727216, 40.76727216), data.stations.lon = c(-73.99392888,
-73.99392888), data.stations.region_id = c(71, 71), data.stations.rental_methods = structure(c(2L,
1L), .Label = c("CREDITCARD", "KEY"), class = "factor"),
data.stations.capacity = c(39, 39), data.stations.eightd_has_key_dispenser = c(FALSE,
FALSE), data.stations.station_id.1 = structure(c(1L, 1L), class = "factor", .Label = "79"),
data.stations.name.1 = structure(c(1L, 1L), class = "factor", .Label = "Franklin St & W Broadway"),
data.stations.short_name.1 = structure(c(1L, 1L), class = "factor", .Label = "5430.08"),
data.stations.lat.1 = c(40.71911552, 40.71911552), data.stations.lon.1 = c(-74.00666661,
-74.00666661), data.stations.region_id.1 = c(71, 71), data.stations.rental_methods.1 = structure(c(2L,
1L), .Label = c("CREDITCARD", "KEY"), class = "factor"),
data.stations.capacity.1 = c(33, 33), data.stations.eightd_has_key_dispenser.1 = c(FALSE,
FALSE), data.stations.station_id.2 = structure(c(1L, 1L), class = "factor", .Label = "82"),
data.stations.name.2 = structure(c(1L, 1L), class = "factor", .Label = "St James Pl & Pearl St"),
data.stations.short_name.2 = structure(c(1L, 1L), class = "factor", .Label = "5167.06"),
data.stations.lat.2 = c(40.71117416, 40.71117416), data.stations.lon.2 = c(-74.00016545,
-74.00016545), data.stations.region_id.2 = c(71, 71), data.stations.rental_methods.2 = structure(c(2L,
1L), .Label = c("CREDITCARD", "KEY"), class = "factor"),
data.stations.capacity.2 = c(27, 27), data.stations.eightd_has_key_dispenser.2 = c(FALSE,
FALSE)), .Names = c("data.stations.station_id", "data.stations.name",
"data.stations.short_name", "data.stations.lat", "data.stations.lon",
"data.stations.region_id", "data.stations.rental_methods", "data.stations.capacity",
"data.stations.eightd_has_key_dispenser", "data.stations.station_id.1",
"data.stations.name.1", "data.stations.short_name.1", "data.stations.lat.1",
"data.stations.lon.1", "data.stations.region_id.1", "data.stations.rental_methods.1",
"data.stations.capacity.1", "data.stations.eightd_has_key_dispenser.1",
"data.stations.station_id.2", "data.stations.name.2", "data.stations.short_name.2",
"data.stations.lat.2", "data.stations.lon.2", "data.stations.region_id.2",
"data.stations.rental_methods.2", "data.stations.capacity.2",
"data.stations.eightd_has_key_dispenser.2"), row.names = c(NA,
-2L), class = "data.frame")
因此输出结构应如下所示(显然值不是NA)。每行代表原始数据框的列名
的附加索引号> output
data.stations.station_id data.stations.name data.stations.short_name
1 NA NA NA
2 NA NA NA
3 NA NA NA
data.stations.lat data.stations.lon data.stations.region_id
1 NA NA NA
2 NA NA NA
3 NA NA NA
data.stations.rental_methods data.stations.capacity
1 NA NA
2 NA NA
3 NA NA
data.stations.eightd_has_key_dispenser
1 NA
2 NA
3 NA
答案 0 :(得分:1)
我会尝试:
library(data.table)
rbindlist(lapply(split(seq_along(df), c(0, (seq_along(df)%/%9)[-length(df)])),
function(x) df[, x]), use.names = FALSE)
## data.stations.station_id data.stations.name data.stations.short_name data.stations.lat
## 1: 72 W 52 St & 11 Ave 6926.01 40.76727
## 2: 72 W 52 St & 11 Ave 6926.01 40.76727
## 3: 79 Franklin St & W Broadway 5430.08 40.71912
## 4: 79 Franklin St & W Broadway 5430.08 40.71912
## 5: 82 St James Pl & Pearl St 5167.06 40.71117
## 6: 82 St James Pl & Pearl St 5167.06 40.71117
## data.stations.lon data.stations.region_id data.stations.rental_methods
## 1: -73.99393 71 KEY
## 2: -73.99393 71 CREDITCARD
## 3: -74.00667 71 KEY
## 4: -74.00667 71 CREDITCARD
## 5: -74.00017 71 KEY
## 6: -74.00017 71 CREDITCARD
## data.stations.capacity data.stations.eightd_has_key_dispenser
## 1: 39 FALSE
## 2: 39 FALSE
## 3: 33 FALSE
## 4: 33 FALSE
## 5: 27 FALSE
## 6: 27 FALSE
也就是说,创建一个list
个data.frame
,每个{9}列,rbind
个matrix
。这样,转换为data.table
时,您就不会遇到数据强制问题。
这导致6行x 9列library(jsonlite)
x <- fromJSON("https://gbfs.citibikenyc.com/gbfs/en/station_information.json")
head(x[[3]]$stations)
## station_id name short_name lat lon region_id
## 1 72 W 52 St & 11 Ave 6926.01 40.76727 -73.99393 71
## 2 79 Franklin St & W Broadway 5430.08 40.71912 -74.00667 71
## 3 82 St James Pl & Pearl St 5167.06 40.71117 -74.00017 71
## 4 83 Atlantic Ave & Fort Greene Pl 4354.07 40.68383 -73.97632 71
## 5 116 W 17 St & 8 Ave 6148.02 40.74178 -74.00150 71
## 6 119 Park Ave & St Edwards St 4700.06 40.69609 -73.97803 71
## rental_methods capacity eightd_has_key_dispenser
## 1 KEY, CREDITCARD 39 FALSE
## 2 KEY, CREDITCARD 33 FALSE
## 3 KEY, CREDITCARD 27 FALSE
## 4 KEY, CREDITCARD 62 FALSE
## 5 KEY, CREDITCARD 39 FALSE
## 6 KEY, CREDITCARD 19 FALSE
dim(x[[3]]$stations)
# [1] 665 9
。不确定要使用什么规则来删除行,最后只有3行....
但我认为你正试图解决一个不存在的问题。尝试阅读这样的数据:
class FooProperty
{
Nullable<int> IntergerValue {get; set;}
Nullable<bool> BoolValue {get; set;}
Nullable<float> FloatValue {get; set;}
string StringValue {get; set;}
}
答案 1 :(得分:1)
您可以使用矩阵,但要确保所有因子列都是字符,即
ind <- sapply(df, is.factor)
df[ind] <- lapply(df[ind], as.character)
final_df <- as.data.frame(matrix(unlist(df), ncol = 9, byrow = TRUE))
final_df[c(TRUE, FALSE),]
# V1 V2 V3 V4 V5 V6 V7 V8 V9
#1 72 72 W 52 St & 11 Ave W 52 St & 11 Ave 6926.01 6926.01 40.76727216 40.76727216 -73.99392888
#3 79 79 Franklin St & W Broadway Franklin St & W Broadway 5430.08 5430.08 40.71911552 40.71911552 -74.00666661
#5 82 82 St James Pl & Pearl St St James Pl & Pearl St 5167.06 5167.06 40.71117416 40.71117416 -74.00016545
另一方面,正如@ A5C1D2H2I1M1N2O1R2T1所述,你可能正在寻找这个:
as.data.frame(matrix(c(t(df)), ncol = 9, byrow = TRUE))
# V1 V2 V3 V4 V5 V6 V7 V8 V9
#1 72 W 52 St & 11 Ave 6926.01 40.76727 -73.99393 71 KEY 39 FALSE
#2 79 Franklin St & W Broadway 5430.08 40.71912 -74.00667 71 KEY 33 FALSE
#3 82 St James Pl & Pearl St 5167.06 40.71117 -74.00017 71 KEY 27 FALSE
#4 72 W 52 St & 11 Ave 6926.01 40.76727 -73.99393 71 CREDITCARD 39 FALSE
#5 79 Franklin St & W Broadway 5430.08 40.71912 -74.00667 71 CREDITCARD 33 FALSE
#6 82 St James Pl & Pearl St 5167.06 40.71117 -74.00017 71 CREDITCARD 27 FALSE