Question

我有一个.json文件（超过100,000行），其中包含以下信息：

POST /log?lat=36.804121354&lon=-1.270256482&time=2016-05-18T17:39:59.004Z
{ 'content-type': 'application/x-www-form-urlencoded',
  'content-length': '29',
  host: 'ip_address:port',
  connection: 'Keep-Alive',
  'accept-encoding': 'gzip',
  'user-agent': 'okhttp/3.7.0' }
BODY: lat=36.804121354&lon=-1.270256482

POST /log?lat=36.804123256&lon=-1.270254711&time=2016-05-18T17:40:13.004Z
{ 'content-type': 'application/x-www-form-urlencoded',
  'content-length': '29',
  host: 'ip_address:port',
  connection: 'Keep-Alive',
  'accept-encoding': 'gzip',
  'user-agent': 'okhttp/3.7.0' }
BODY: lat=36.804123256&lon=-1.270254711

POST /log?lat=36.804124589&lon=-1.270255641&time=2016-05-18T17:41:05.004Z
{ 'content-type': 'application/x-www-form-urlencoded',
  'content-length': '29',
  host: 'ip_address:port',
  connection: 'Keep-Alive',
  'accept-encoding': 'gzip',
  'user-agent': 'okhttp/3.7.0' }
BODY: lat=36.804124589&lon=-1.270255641

.......

以上信息以更新的latitude，longitude和time重复。使用R，如何从该文件中提取纬度，经度和时间？并将它们存储在dataframe中，如下所示：

id  lat           lon            time
1   36.804121354  -1.270256482   2016-05-18 17:39:59
2   36.804123256  -1.270254711   2016-05-18 17:40:13
3   36.804124589  -1.270255641   2016-05-18 17:41:05

Answer 1

看来您的数据严格来说不是JSON。由于请求的数据全部包含在“ Post”行中，因此一种解决方案是将这些行过滤掉然后解析。

#Read lines
x<-readLines("test.txt")
#Find lines beginning with "POST"
posts<-x[grep("^POST", x)]
#Remove the prefix: "POST /log?"
posts<-sub("^POST /log\\?", "", posts)
#split remaining fields on the &
fields<-unlist(strsplit(posts, "\\&"))

#remove the prefixes ("lat=", "lon=", "time=")
fields<-sub("^.*=", "", fields)

#make a dataframe (assume the fields are always in the same order)
df<-as.data.frame(matrix(fields, ncol=3, byrow=TRUE), stringsAsFactors = FALSE)
names(df)<-c("lat", "lon", "time") 
#convert the columns to the proper type.
df$lat<-as.numeric(df$lat)
df$lon<-as.numeric(df$lon)
df$time<-as.POSIXct(df$time, "%FT%T", tz="UTC")

R：从JSON文件提取纬度，经度和时间

1 个答案: