根据日期间隔从R

时间:2020-06-10 18:38:09

标签: r loops dataframe date

我有两个data.frame

在DF1中,我具有ID级数据,其中包含两个日期列,即 Date1 Date2 以及州和县列。如果ID具有不同的 Date1 ,则ID将重复,但它们始终具有相同的 Date2

在DF2中,我有一个日期列(每个日期/州/县组合中有多个条目,因为数据在一天内多次被收集),州,县和我感兴趣的变量X。 / p>

我想在DF1中创建一个单独的列,该列针对特定县和州在Date1到Date2的时间间隔内为每个人平均X。

DF2中的县/州组合比DF1中的多。

基于有人问的上一个问题,我尝试了以下操作,并收到一条错误消息,指出DF1与DF2的状态列中存在不同数量的因素:

for (i in 1:nrow(DF1)){
DF1$MeanX[i]<-mean(DF2$X[which(DF2$State==DF1$State[i] &
                            DF2$County==DF1$County[i] &
                            DF2$Date>=DF1$Date1[i] &
                            DF2$Date<=DF1$Date2[i])])
}

这是DF1的精简版:

structure(list(ID = c(10L, 1003L, 1007L, 1007L, 101L, 101L, 101L, 
1011L, 1015L, 1016L), Date1 = structure(c(16724, 17317, 16919, 
17316, 17053, 17056, 17427, 16778, 17091, 17317), class = "Date"), 
Date2 = structure(c(16825, 17444, 17371, 17371, 17548, 17548, 
17548, 16839, 17378, 17378), class = "Date"), State = structure(c(8L, 
4L, 24L, 24L, 25L, 25L, 25L, 4L, 4L, 23L), .Label = c("", 
"Alabama", "Arizona", "California", "Colorado", "Connecticut", 
"District Of Columbia", "Florida", "Georgia", "Illinois", 
"Kansas", "Kentucky", "Maryland", "Mississippi", "Missouri", 
"Nebraska", "Nevada", "New York", "North Carolina", "Ohio", 
"Oklahoma", "Rhode Island", "Texas", "Virginia", "Washington"
), class = "factor"), County = structure(c(16L, 48L, 42L, 
42L, 29L, 29L, 29L, 48L, 48L, 37L), .Label = c("", "Anne Arundel", 
"Arlington", "Bay", "Bell", "Bexar", "Camden", "Chatham", 
"Chattahoochee", "Christian", "Clark", "Comanche", "Craven", 
"Cumberland", "District Of Columbia", "Duval", "El Paso", 
"Escambia", "Fairfax", "Geary", "Greene", "Hampton City", 
"Hardin", "Harrison", "Hillsborough", "Hinds", "Houston", 
"Island", "Kitsap", "Lake", "Liberty", "Montgomery", "New London", 
"Newport", "Newport News", "Norfolk City", "Nueces", "Okaloosa", 
"Onslow", "Orange", "Pierce", "Portsmouth", "Prince George's", 
"Prince William", "Pulaski", "Richmond", "San Bernadino", 
"San Diego", "Sarpy", "Solano", "Tom Green", "Virginia Beach City", 
"Yuma"), class = "factor")), row.names = c(NA, 10L), class = "data.frame")

这是DF2的精简版本(由于包含太多内容,因此仅查看结构,因此不包括每个州/县的组合):

structure(list(Date = structure(c(16441, 16447, 16453, 16459, 
16465, 16471, 16477, 16483, 16489, 16495), class = "Date"), X = c(8L, 
8L, 10L, 14L, 8L, 9L, 12L, 8L, 17L, 17L), State = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Alabama", "Alaska", 
"Arizona", "California", "Colorado", "District Of Columbia", 
"Florida", "Georgia", "Hawaii", "Louisiana", "Maryland", "Mississippi", 
"Nevada", "New Mexico", "North Carolina", "North Dakota", "Ohio", 
"Tennessee", "Texas", "Utah", "Virginia"), class = "factor"), 
County = structure(c(20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 
20L, 20L), .Label = c("Anchorage ", "Anne Arundel", "Bernalillo", 
"Bexar", "Bossier", "Brevard", "Clark", "Cumberland", "Davis", 
"District of Columbia", "Duval", "El Paso", "Fairfax", "Greene", 
"Hampton City", "Hillsborough", "Hinds", "Honolulu", "Kings", 
"Madison", "Monterey", "Montgomery", "Norfolk City", "Nueces", 
"Okaloosa", "Pima", "Prince George's", "Richmond", "San Bernardino", 
"San Diego", "Santa Barbara", "Shelby", "Solano", "Tarrant", 
"Ventura", "Ward", "Yuma"), class = "factor")), row.names = c(NA, 
10L), class = "data.frame")

有人可以根据县和州(匹配DF1和DF2)将我的循环调整为 Date1 Date2 (来自DF1)之间的平均值X(来自DF2)吗?

0 个答案:

没有答案