从data.table r中的字符串中提取年和周

时间:2020-10-24 17:01:42

标签: r data.table

我有一个数据表。

int Counter::Increment()
{
    releaseIncrement();
    emit wasIncremented();
    return m_count;
}

int Counter::Decrement()
{
    releaseDecrement();
    emit wasDecremented();
    return m_count;
}

void Counter::releaseIncrement()
{
    m_count++;
    std::cout <<"m_count was incremented by releaseIncrement() slot. Now m_count is: " <<m_count <<std::endl;
}

void Counter::releaseDecrement()
{
    std::cout <<"m_count was decremented by releaseDecrement() method. Now m_count is: " <<m_count <<std::endl;
    m_count--;
}

我需要添加另外两个列,“年”和“周”,以从“ year_week”列中提取年和周。我已经使用ifelse语句完成了以下操作,但是那效率低下,并且不能使用超过51个嵌套的ifelse语句。

dt <- data.table::data.table(
  "year_week" = c("y_9001", "y_9002", "y_9003", "y_9004", "y_9005", "y_9101", "y_9102", "y_9103", "y_9104", "y_9105" )
)

有没有ifelse陈述的建议吗?谢谢。

3 个答案:

答案 0 :(得分:0)

您可以使用substr函数:

library(data.table)

dt <- data.table(
  "year_week" = c("y_9001", "y_9002", "y_9003", "y_9004", "y_9005", "y_9101", "y_9102", "y_9103", "y_9104", "y_9105" )
)
dt[, year := substr(year_week, 3, 4)][
   , week := substr(year_week, 5, 6)]
dt
#>     year_week year week
#>  1:    y_9001   90   01
#>  2:    y_9002   90   02
#>  3:    y_9003   90   03
#>  4:    y_9004   90   04
#>  5:    y_9005   90   05
#>  6:    y_9101   91   01
#>  7:    y_9102   91   02
#>  8:    y_9103   91   03
#>  9:    y_9104   91   04
#> 10:    y_9105   91   05

由于我不知道数据集中涉及的年份,因此您需要确定在所有年份的前面简单粘贴“ 19”是否安全,或者有时是否需要粘贴“ 20”一个不同的世纪。

答案 1 :(得分:0)

这项工作:

> library(dplyr)
> dt %>% mutate(year = paste0(19,as.numeric(substr(year_week, 3,4))), 
  week = (as.numeric(substr(year_week, 5,6))))
    year_week year week
 1:    y_9001 1990    1
 2:    y_9002 1990    2
 3:    y_9003 1990    3
 4:    y_9004 1990    4
 5:    y_9005 1990    5
 6:    y_9101 1991    1
 7:    y_9102 1991    2
 8:    y_9103 1991    3
 9:    y_9104 1991    4
10:    y_9105 1991    5

答案 2 :(得分:0)

这项工作吗?简单,简单,但如果您的df非常庞大,就会有点慢

library(stringr)

dt$year<-paste0(19, str_sub(dt$year_week, 3L, 4L))
dt$week<-str_sub(dt$year_week, 5L, -1L)

dt     

    year_week year week
 1:    y_9001 1990   01
 2:    y_9002 1990   02
 3:    y_9003 1990   03
 4:    y_9004 1990   04
 5:    y_9005 1990   05
 6:    y_9101 1991   01
 7:    y_9102 1991   02
 8:    y_9103 1991   03
 9:    y_9104 1991   04
10:    y_9105 1991   05