带有整齐的数据转换(一列)

时间:2019-05-08 11:19:38

标签: r dataframe tidyverse

我有一个名为DATA_TEST的表。该表包含一列,其中包含七个不同的数据案例。

enter image description here

#DATA
DATA_TEST<-data.frame(
         CUSTOMS_RATE=c("10","20.1","15+0,41 eur/kg","10+0,1 eur/kg 
         max.17","0,1 eur/l max.17","0,04                  eur/kg 
         max.10","NA"))
View(DATA_TEST)

因此,我的意图是将该列分为三个不同的列,以便继续进行其他统计操作(计算平均值等),例如下面的表(DATA_TEST1)。

enter image description here

多亏了这个社区,我才获得了这段代码。但是有一个错误,因为第二条记录“ 20.1”而不是停留在RATE列中,而是转到下一个记录或恰好在SPECIFIC_RATE列中。

library(tidyverse)
DATA_TEST %>%
  mutate(CUSTOMS_RATE = str_replace_all(CUSTOMS_RATE, ",", "."),
         RATE = str_extract(CUSTOMS_RATE, "^[0-9]+(?=\\+|$)"), 
         SPECIFIC_RATE = str_extract(CUSTOMS_RATE, "\\d+\\.\\d+"), 
         MAXIMUM_RATE = str_extract(CUSTOMS_RATE, "(?<=max\\.)\\d+")) %>% 
  select(2:4) %>%
  mutate_all(as.numeric)

那么有人可以帮我解决这个问题吗?

1 个答案:

答案 0 :(得分:1)

一种选择是将RATE中的代码更改为

RATE = str_extract(CUSTOMS_RATE, "^[0-9]+(?=\\+|$)|^[0-9.]+$")

-完整代码

DATA_TEST %>%
  mutate(CUSTOMS_RATE = str_replace_all(CUSTOMS_RATE, ",", "."), 
  RATE = str_extract(CUSTOMS_RATE, "^[0-9]+(?=\\+|$)|^[0-9.]+$"), 
  SPECIFIC_RATE = str_extract(CUSTOMS_RATE, "\\d+\\.\\d+(?=\\s)"), 
  MAXIMUM_RATE = str_extract(CUSTOMS_RATE, "(?<=max\\.)\\d+")) %>% 
  select(2:4) %>% 
  mutate_all(as.numeric)
# RATE SPECIFIC_RATE MAXIMUM_RATE
#1 10.0            NA           NA
#2 20.1            NA          NA
#3 15.0          0.41           NA
#4 10.0          0.10           17
#5   NA          0.10           17
#6   NA          0.04           10
#7   NA            NA           NA