在R中解析正则表达式

时间:2017-11-28 20:38:53

标签: json r regex

我是正则表达式的新手,在payload.hours中有一些JSON数据:

enter image description here 这些代表不同地方一周中每一天的营业时间。将这些数据提取到同一数据框的新列中最简洁的方法是什么?我希望将每一天的工作时间提取到一个单独的列中。

在此处附加数据 - :

structure(list(payload.name = c("Windsor Castle", "Penrhyn Castle", 
"Barcaldine Castle", "University of Birmingham", "Kensington Gardens", 
"Bristol Zoo Gardens", "Alton Towers Resort", "The Royal Shakespeare Company", 
"University of Surrey", "Hampton Court Palace", "BFI Imax", "Usher Gallery", 
"The Cambridge Chop House", "The Balmoral Hotel", "Scottish National Portrait Gallery", 
"Old Trafford Stadium", "Exeter Cathedral", "Excel London", "Cavern Club"
), payload.hours_display = c("Mon-Sat 11:30 AM-11:00 PM; Sun 12:00 PM-11:00 PM", 
"Open Daily 11:00 AM-5:00 PM", "Open Daily 9:00 AM-10:00 PM", 
"Mon-Fri 8:00 AM-7:00 PM; Sat-Sun 10:00 AM-3:00 PM", "Open Daily 5:00 AM-11:59 PM", 
"Open Daily 9:00 AM-5:00 PM", "Open Daily 10:30 AM-5:00 PM", 
"Mon-Sat 11:30 AM-11:00 PM; Sun 11:30 AM-4:30 PM", "Mon-Fri 8:30 AM-11:00 PM; Sat-Sun 10:00 AM-11:00 PM", 
"Mon-Sat 12:00 PM-11:00 PM; Sun 12:00 PM-10:30 PM", "Mon 1:30 PM-11:00 PM; Tue 10:30 AM-11:00 PM; Wed-Fri 1:30 PM-11:00 PM; Sat-Sun 11:00 AM-11:00 PM", 
"Open Daily 10:00 AM-4:00 PM", "Mon-Thu 12:00 PM-10:30 PM; Fri-Sat 12:00 PM-11:00 PM; Sun 12:00 PM-9:30 PM", 
"Open Daily 12:00 AM-12:00 AM", "Mon-Wed 10:00 AM-5:00 PM; Thu 10:00 AM-7:00 PM; Fri-Sun 10:00 AM-5:00 PM", 
"Open Daily 9:30 AM-5:00 PM", "Mon-Sat 9:30 AM-4:45 PM; Sun 11:30 AM-3:30 PM", 
"Open Daily 5:30 AM-10:00 PM", "Mon 12:00 AM-12:30 AM, 11:00 AM-8:00 PM; Tue-Wed 11:00 AM-8:00 PM; Thu 11:00 AM-11:59 PM; Fri 12:00 AM-2:00 AM, 11:00 AM-11:59 PM; Sat-Sun 12:00 AM-2:30 AM, 11:00 AM-11:59 PM"
), payload.hours = c("{\"monday\":[[\"11:30\",\"23:00\"]],\"tuesday\":[[\"11:30\",\"23:00\"]],\"wednesday\":[[\"11:30\",\"23:00\"]],\"thursday\":[[\"11:30\",\"23:00\"]],\"friday\":[[\"11:30\",\"23:00\"]],\"saturday\":[[\"11:30\",\"23:00\"]],\"sunday\":[[\"12:00\",\"23:00\"]]}", 
"{\"monday\":[[\"11:00\",\"17:00\"]],\"tuesday\":[[\"11:00\",\"17:00\"]],\"wednesday\":[[\"11:00\",\"17:00\"]],\"thursday\":[[\"11:00\",\"17:00\"]],\"friday\":[[\"11:00\",\"17:00\"]],\"saturday\":[[\"11:00\",\"17:00\"]],\"sunday\":[[\"11:00\",\"17:00\"]]}", 
"{\"monday\":[[\"9:00\",\"22:00\"]],\"tuesday\":[[\"9:00\",\"22:00\"]],\"wednesday\":[[\"9:00\",\"22:00\"]],\"thursday\":[[\"9:00\",\"22:00\"]],\"friday\":[[\"9:00\",\"22:00\"]],\"saturday\":[[\"9:00\",\"22:00\"]],\"sunday\":[[\"9:00\",\"22:00\"]]}", 
"{\"monday\":[[\"8:00\",\"19:00\"]],\"tuesday\":[[\"8:00\",\"19:00\"]],\"wednesday\":[[\"8:00\",\"19:00\"]],\"thursday\":[[\"8:00\",\"19:00\"]],\"friday\":[[\"8:00\",\"19:00\"]],\"saturday\":[[\"10:00\",\"15:00\"]],\"sunday\":[[\"10:00\",\"15:00\"]]}", 
"{\"monday\":[[\"5:00\",\"23:59\"]],\"tuesday\":[[\"5:00\",\"23:59\"]],\"wednesday\":[[\"5:00\",\"23:59\"]],\"thursday\":[[\"5:00\",\"23:59\"]],\"friday\":[[\"5:00\",\"23:59\"]],\"saturday\":[[\"5:00\",\"23:59\"]],\"sunday\":[[\"5:00\",\"23:59\"]]}", 
"{\"monday\":[[\"9:00\",\"17:00\"]],\"tuesday\":[[\"9:00\",\"17:00\"]],\"wednesday\":[[\"9:00\",\"17:00\"]],\"thursday\":[[\"9:00\",\"17:00\"]],\"friday\":[[\"9:00\",\"17:00\"]],\"saturday\":[[\"9:00\",\"17:00\"]],\"sunday\":[[\"9:00\",\"17:00\"]]}", 
"{\"monday\":[[\"10:30\",\"17:00\"]],\"tuesday\":[[\"10:30\",\"17:00\"]],\"wednesday\":[[\"10:30\",\"17:00\"]],\"thursday\":[[\"10:30\",\"17:00\"]],\"friday\":[[\"10:30\",\"17:00\"]],\"saturday\":[[\"10:30\",\"17:00\"]],\"sunday\":[[\"10:30\",\"17:00\"]]}", 
"{\"monday\":[[\"11:30\",\"23:00\"]],\"tuesday\":[[\"11:30\",\"23:00\"]],\"wednesday\":[[\"11:30\",\"23:00\"]],\"thursday\":[[\"11:30\",\"23:00\"]],\"friday\":[[\"11:30\",\"23:00\"]],\"saturday\":[[\"11:30\",\"23:00\"]],\"sunday\":[[\"11:30\",\"16:30\"]]}", 
"{\"monday\":[[\"8:30\",\"23:00\"]],\"tuesday\":[[\"8:30\",\"23:00\"]],\"wednesday\":[[\"8:30\",\"23:00\"]],\"thursday\":[[\"8:30\",\"23:00\"]],\"friday\":[[\"8:30\",\"23:00\"]],\"saturday\":[[\"10:00\",\"23:00\"]],\"sunday\":[[\"10:00\",\"23:00\"]]}", 
"{\"monday\":[[\"12:00\",\"23:00\"]],\"tuesday\":[[\"12:00\",\"23:00\"]],\"wednesday\":[[\"12:00\",\"23:00\"]],\"thursday\":[[\"12:00\",\"23:00\"]],\"friday\":[[\"12:00\",\"23:00\"]],\"saturday\":[[\"12:00\",\"23:00\"]],\"sunday\":[[\"12:00\",\"22:30\"]]}", 
"{\"monday\":[[\"13:30\",\"23:00\"]],\"tuesday\":[[\"10:30\",\"23:00\"]],\"wednesday\":[[\"13:30\",\"23:00\"]],\"thursday\":[[\"13:30\",\"23:00\"]],\"friday\":[[\"13:30\",\"23:00\"]],\"saturday\":[[\"11:00\",\"23:00\"]],\"sunday\":[[\"11:00\",\"23:00\"]]}", 
"{\"monday\":[[\"10:00\",\"16:00\"]],\"tuesday\":[[\"10:00\",\"16:00\"]],\"wednesday\":[[\"10:00\",\"16:00\"]],\"thursday\":[[\"10:00\",\"16:00\"]],\"friday\":[[\"10:00\",\"16:00\"]],\"saturday\":[[\"10:00\",\"16:00\"]],\"sunday\":[[\"10:00\",\"16:00\"]]}", 
"{\"monday\":[[\"12:00\",\"22:30\"]],\"tuesday\":[[\"12:00\",\"22:30\"]],\"wednesday\":[[\"12:00\",\"22:30\"]],\"thursday\":[[\"12:00\",\"22:30\"]],\"friday\":[[\"12:00\",\"23:00\"]],\"saturday\":[[\"12:00\",\"23:00\"]],\"sunday\":[[\"12:00\",\"21:30\"]]}", 
"{\"monday\":[[\"00:00\",\"24:00\"]],\"tuesday\":[[\"00:00\",\"24:00\"]],\"wednesday\":[[\"00:00\",\"24:00\"]],\"thursday\":[[\"00:00\",\"24:00\"]],\"friday\":[[\"00:00\",\"24:00\"]],\"saturday\":[[\"00:00\",\"24:00\"]],\"sunday\":[[\"00:00\",\"24:00\"]]}", 
"{\"monday\":[[\"10:00\",\"17:00\"]],\"tuesday\":[[\"10:00\",\"17:00\"]],\"wednesday\":[[\"10:00\",\"17:00\"]],\"thursday\":[[\"10:00\",\"19:00\"]],\"friday\":[[\"10:00\",\"17:00\"]],\"saturday\":[[\"10:00\",\"17:00\"]],\"sunday\":[[\"10:00\",\"17:00\"]]}", 
"{\"monday\":[[\"9:30\",\"17:00\"]],\"tuesday\":[[\"9:30\",\"17:00\"]],\"wednesday\":[[\"9:30\",\"17:00\"]],\"thursday\":[[\"9:30\",\"17:00\"]],\"friday\":[[\"9:30\",\"17:00\"]],\"saturday\":[[\"9:30\",\"17:00\"]],\"sunday\":[[\"9:30\",\"17:00\"]]}", 
"{\"monday\":[[\"9:30\",\"16:45\"]],\"tuesday\":[[\"9:30\",\"16:45\"]],\"wednesday\":[[\"9:30\",\"16:45\"]],\"thursday\":[[\"9:30\",\"16:45\"]],\"friday\":[[\"9:30\",\"16:45\"]],\"saturday\":[[\"9:30\",\"16:45\"]],\"sunday\":[[\"11:30\",\"15:30\"]]}", 
"{\"monday\":[[\"5:30\",\"22:00\"]],\"tuesday\":[[\"5:30\",\"22:00\"]],\"wednesday\":[[\"5:30\",\"22:00\"]],\"thursday\":[[\"5:30\",\"22:00\"]],\"friday\":[[\"5:30\",\"22:00\"]],\"saturday\":[[\"5:30\",\"22:00\"]],\"sunday\":[[\"5:30\",\"22:00\"]]}", 
"{\"monday\":[[\"00:00\",\"00:30\"],[\"11:00\",\"20:00\"]],\"tuesday\":[[\"11:00\",\"20:00\"]],\"wednesday\":[[\"11:00\",\"20:00\"]],\"thursday\":[[\"11:00\",\"23:59\"]],\"friday\":[[\"00:00\",\"2:00\"],[\"11:00\",\"23:59\"]],\"saturday\":[[\"00:00\",\"2:30\"],[\"11:00\",\"23:59\"]],\"sunday\":[[\"00:00\",\"2:30\"],[\"11:00\",\"23:59\"]]}"
)), .Names = c("payload.name", "payload.hours_display", "payload.hours"
), row.names = c(3L, 5L, 10L, 14L, 18L, 19L, 24L, 32L, 33L, 35L, 
36L, 37L, 38L, 42L, 43L, 44L, 45L, 47L, 54L), class = "data.frame")

我尝试使用plyr :: ldply()函数根据这里的答案展开这个JSON - How to convert a list consisting of vector of different lengths to a usable data frame in R?并且非常接近,但是时间没有按正确的顺序排列(有些日子有多个操作小时)

最终,我想通过提取在周日中午12点之前开放的商家来操纵数据框架。

1 个答案:

答案 0 :(得分:1)

<强>更新

我已按照以下步骤设法将我需要的内容提取到单独的列中。

mydf = cbind(mydf, do.call(plyr::rbind.fill, lapply(paste0("
[",mydf$payload.hours,"]"), function(x) jsonlite::fromJSON(x))))

这会通过将每天的值提取为字符串列表来创建新列。

接下来,我取消列出值并展平数据框

x = stri_list2matrix(mydf$sunday, byrow=TRUE).
colnames(x) = c("From.1", "To.1", "From.2", "To.2")

最后,我将其绑定到原始数​​据框以获得结果。

mydf = data.frame(mydf,x)

清理后,它看起来像这样(显示特定的一天):

enter image description here