解析r中的列(或其他语言,例如SQL)

时间:2018-10-13 20:09:26

标签: sql r

这是当前数据框:

baking_time <- c("20 to 30 min", "20 to 30 min", "40 to 50 min", "10 to 30 min", "60 to 90 min", "40 to 50 min")
cake_type <- c("Chocolate", "Chocolate","Lemon","Tart","German","Lemon")


recipes <- data.frame(baking_time, cake_type)

现在,我正在尝试解析烘烤时间以获取此信息:

baking_time <- c(25, 25, 45, 20, 75, 45)

我尝试使用解析,但是解析两个数字比对它们执行操作有困难

mutate(avg_time = (parse_number(baking_time) + parse_number(baking_time))/2)

3 个答案:

答案 0 :(得分:7)

我们提取列的数字部分并获得 int entered_number = 0; do { //ask for user entry Console.Write("enter a number: "); entered_number = int.Parse(Console.ReadLine()); if (entered_number < 0) { Console.WriteLine("Number is negative"); } else if (entered_number > 0) { Console.WriteLine(IsPrimeNumber(entered_number) ? "Number is Prime" : "Number is not Prime"); } else { break; } } while (entered_number != 0); Console.WriteLine("End of program"); Console.ReadKey();

mean

注意:library(tidyverse) recipes %>% mutate(avg_time = str_extract_all(baking_time, "\\d+") %>% map(., ~ mean(as.numeric(.x)))) # baking_time cake_type avg_time #1 20 to 30 min Chocolate 25 #2 20 to 30 min Chocolate 25 #3 40 to 50 min Lemon 45 #4 10 to 30 min Tart 20 #5 60 to 90 min German 75 #6 40 to 50 min Lemon 45 提取第一个数字部分。如果有多个元素,则需要分解并应用readr::parse_number

parse_number

使用recipes %>% separate(baking_time, into = c("first", "second"), sep=" to ", remove = FALSE) %>% transmute(baking_time, avg_time = (parse_number(first) + parse_number(second))/2) 时,一种选择是使用base R将非数字部分更改为定界符,然后使用read.csv进行读取,获得gsub

rowMeans

答案 1 :(得分:4)

您可以使用gregexprregmatches在基数R中获得时间。

Times = regmatches(baking_time, gregexpr("\\d+", baking_time))
sapply(Times, function(x) mean(as.numeric(x)))
[1] 25 25 45 20 75 45

答案 2 :(得分:2)

stringi(不加约束的stringr)和基本R:

stringi::stri_match_first_regex(
  recipes$baking_time, 
  "([[:digit:]]+)[[:space:]]+to[[:space:]]+([[:digit:]]+)",
)[,2:3] -> x
class(x) <- "numeric"
apply(x, 1, mean)
## [1] 25 25 45 20 75 45