我有一个.json
文件(超过100,000行),其中包含以下信息:
POST /log?lat=36.804121354&lon=-1.270256482&time=2016-05-18T17:39:59.004Z
{ 'content-type': 'application/x-www-form-urlencoded',
'content-length': '29',
host: 'ip_address:port',
connection: 'Keep-Alive',
'accept-encoding': 'gzip',
'user-agent': 'okhttp/3.7.0' }
BODY: lat=36.804121354&lon=-1.270256482
POST /log?lat=36.804123256&lon=-1.270254711&time=2016-05-18T17:40:13.004Z
{ 'content-type': 'application/x-www-form-urlencoded',
'content-length': '29',
host: 'ip_address:port',
connection: 'Keep-Alive',
'accept-encoding': 'gzip',
'user-agent': 'okhttp/3.7.0' }
BODY: lat=36.804123256&lon=-1.270254711
POST /log?lat=36.804124589&lon=-1.270255641&time=2016-05-18T17:41:05.004Z
{ 'content-type': 'application/x-www-form-urlencoded',
'content-length': '29',
host: 'ip_address:port',
connection: 'Keep-Alive',
'accept-encoding': 'gzip',
'user-agent': 'okhttp/3.7.0' }
BODY: lat=36.804124589&lon=-1.270255641
.......
以上信息以更新的latitude
,longitude
和time
重复。使用R
,如何从该文件中提取纬度,经度和时间?并将它们存储在dataframe
中,如下所示:
id lat lon time
1 36.804121354 -1.270256482 2016-05-18 17:39:59
2 36.804123256 -1.270254711 2016-05-18 17:40:13
3 36.804124589 -1.270255641 2016-05-18 17:41:05
答案 0 :(得分:2)
看来您的数据严格来说不是JSON。由于请求的数据全部包含在“ Post”行中,因此一种解决方案是将这些行过滤掉然后解析。
#Read lines
x<-readLines("test.txt")
#Find lines beginning with "POST"
posts<-x[grep("^POST", x)]
#Remove the prefix: "POST /log?"
posts<-sub("^POST /log\\?", "", posts)
#split remaining fields on the &
fields<-unlist(strsplit(posts, "\\&"))
#remove the prefixes ("lat=", "lon=", "time=")
fields<-sub("^.*=", "", fields)
#make a dataframe (assume the fields are always in the same order)
df<-as.data.frame(matrix(fields, ncol=3, byrow=TRUE), stringsAsFactors = FALSE)
names(df)<-c("lat", "lon", "time")
#convert the columns to the proper type.
df$lat<-as.numeric(df$lat)
df$lon<-as.numeric(df$lon)
df$time<-as.POSIXct(df$time, "%FT%T", tz="UTC")