在日期上对字符串匹配的列进行求和

时间:2017-09-19 16:04:16

标签: r dataframe

我有一个数据框 df ,其ID变量和每日日期(格式为XYYYYMMDD)作为列标题:

yyyymm <- c("X201701","X201702","X201703")

对于每个ID,我想对属于同一个月的所有日期/列进行求和。如果 yyyymm 是前三个月的字符串向量

 ID X201701 X201702 X201703
101       1      NA      NA
102       1       3       1
203       1       1      NA
207       5      NA      NA
209       2       1      NA

我想获取数据框想要,其中yyyymm中的字符串作为列的标题。那就是:

test = lapply(df, function(x) colSums(df[,grepl(x, names(df))]))

我的想法是避免重塑数据集的格式,并使用lapply和grepl函数来部分匹配字符串,但我错过了一些东西。

string connectionString = @"Data Source = (localdb)\MSSQLLocalDB; Initial Catalog ='C:\USERS\uppy8\Desktop\Computer Science Project\Mining Game\Assets\MineRace.mdf'; Integrated Security = True; Connect Timeout = 30; Encrypt = False; TrustServerCertificate = True; ApplicationIntent = ReadWrite; MultiSubnetFailover = False"
        SqlConnection con = new SqlConnection();   
                    if (con.State==ConnectionState.Open)
                    {
                        con.Close();
                        con.ConnectionString = connectionString;
                        con.Open();
                        cmd.Connection = con;
                    }
                    else
                    {
                        con.ConnectionString = connectionString;
                        con.Open();
                        cmd.Connection = con;
                    }

非常感谢。

2 个答案:

答案 0 :(得分:1)

这里有一个使用lubridate包来解析日期,split.default将data.frame划分为基于相同月份的组

library(lubridate)
factors = sapply(ymd(gsub("X", "", names(df)[-1])), function(x)
    paste0(year(x), sprintf("%02d", as.integer(month(x)))))
data.frame(df[,1],
           lapply(split.default(df[,-1], factors), function(x)
               rowSums(x, na.rm = TRUE) * (NA^(rowSums(is.na(x)) == NCOL(x)))))
#   ID X201701 X201702 X201703
#1 101       1      NA      NA
#2 102       1       3       1
#3 203       1       1      NA
#4 207       5      NA      NA
#5 209       2       1      NA

答案 1 :(得分:0)

您是否有理由不想传播数据?

library(tidyverse)
want <- df %>%
          gather(key, value, -ID) %>%
          mutate(key = substr(key, 1, 7)) %>%
          group_by(ID, key) %>%
          summarise(value = sum(value, na.rm=TRUE)) %>%
          spread(key, value)

# A tibble: 5 x 4
# Groups:   ID [5]
     ID X201701 X201702 X201703
* <dbl>   <dbl>   <dbl>   <dbl>
1   101       1       0       0
2   102       1       3       1
3   203       1       1       0
4   207       5       0       0
5   209       2       1       0