我有一个数据框 df ,其ID变量和每日日期(格式为XYYYYMMDD)作为列标题:
yyyymm <- c("X201701","X201702","X201703")
对于每个ID,我想对属于同一个月的所有日期/列进行求和。如果 yyyymm 是前三个月的字符串向量
ID X201701 X201702 X201703
101 1 NA NA
102 1 3 1
203 1 1 NA
207 5 NA NA
209 2 1 NA
我想获取数据框想要,其中yyyymm中的字符串作为列的标题。那就是:
test = lapply(df, function(x) colSums(df[,grepl(x, names(df))]))
我的想法是避免重塑数据集的格式,并使用lapply和grepl函数来部分匹配字符串,但我错过了一些东西。
string connectionString = @"Data Source = (localdb)\MSSQLLocalDB; Initial Catalog ='C:\USERS\uppy8\Desktop\Computer Science Project\Mining Game\Assets\MineRace.mdf'; Integrated Security = True; Connect Timeout = 30; Encrypt = False; TrustServerCertificate = True; ApplicationIntent = ReadWrite; MultiSubnetFailover = False"
SqlConnection con = new SqlConnection();
if (con.State==ConnectionState.Open)
{
con.Close();
con.ConnectionString = connectionString;
con.Open();
cmd.Connection = con;
}
else
{
con.ConnectionString = connectionString;
con.Open();
cmd.Connection = con;
}
非常感谢。
答案 0 :(得分:1)
这里有一个使用lubridate
包来解析日期,split.default
将data.frame划分为基于相同月份的组
library(lubridate)
factors = sapply(ymd(gsub("X", "", names(df)[-1])), function(x)
paste0(year(x), sprintf("%02d", as.integer(month(x)))))
data.frame(df[,1],
lapply(split.default(df[,-1], factors), function(x)
rowSums(x, na.rm = TRUE) * (NA^(rowSums(is.na(x)) == NCOL(x)))))
# ID X201701 X201702 X201703
#1 101 1 NA NA
#2 102 1 3 1
#3 203 1 1 NA
#4 207 5 NA NA
#5 209 2 1 NA
答案 1 :(得分:0)
您是否有理由不想传播数据?
library(tidyverse)
want <- df %>%
gather(key, value, -ID) %>%
mutate(key = substr(key, 1, 7)) %>%
group_by(ID, key) %>%
summarise(value = sum(value, na.rm=TRUE)) %>%
spread(key, value)
# A tibble: 5 x 4
# Groups: ID [5]
ID X201701 X201702 X201703
* <dbl> <dbl> <dbl> <dbl>
1 101 1 0 0
2 102 1 3 1
3 203 1 1 0
4 207 5 0 0
5 209 2 1 0