我有两个数据框,我试图确定是否存在相关性。我试图提出的基本问题是,冬季天气模式是否会导致出生率上升(九个月后)。
数据框已经简化为仅包含(我认为是)必要的信息。天气数据框仅包含九个月后与出生数据框对齐的观测结果。当我使用ccf函数时,它成功地绘制了数据,但我知道我没有正确设置它。我需要考虑在与另一个(model.births)绘制相关性之前九个月发生的一个变量(model.weather)。
现在,它的设置非常简单:
ccf(model.weather$EVENT_TYPE, model.births$BIRTH_TOTAL)
有人可以帮我正确地将数据抵消九个月吗?
这是两个数据框的样子:
dput(model.weather)
structure(list(DATE = structure(c(13514, 13514, 13545, 13545,
13545, 13545, 13545, 13545, 13545, 13545, 13545, 13545, 13545,
13545, 13545, 13545, 13545, 13545, 13545, 13545, 13573, 13573,
13573, 13573, 13573, 13573, 13573, 13573, 13573, 13573, 13573,
13573, 13573, 13573, 13573, 13573, 13573, 13573, 13573, 13573,
13573, 13573, 13604, 13604, 13604, 13848, 13848, 13848, 13848,
13848, 13848, 13848, 13848, 13848, 13848, 13848, 13848, 13848,
13848, 13848, 13848, 13848, 13848, 13848, 13848, 13848, 13848,
13848, 13848, 13848, 13848, 13848, 13848, 13848, 13848, 13848,
13848, 13848, 13848, 13848, 13848, 13848, 13848, 13848, 13848,
13848, 13848, 13848, 13848, 13848, 13848, 13848, 13848, 13879,
13879, 13879, 13879, 13879, 13879, 13879, 13879, 13879, 13879,
13879, 13879, 13879, 13879, 13879, 13879, 13879, 13879, 13879,
13879, 13879, 13879, 13910, 13910, 13910, 13910, 13910, 13910,
13910, 13910, 13910, 13910, 13910, 13910, 13910, 13910, 13910,
13910, 13910, 13910, 13910, 13910, 13910, 13910, 13910, 13939,
13939, 13939, 13939, 13939, 13939, 13939, 14214, 14214, 14214,
14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214,
14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214,
14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214,
14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214,
14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214,
14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214,
14214, 14214, 14245, 14245, 14245, 14245, 14245, 14245, 14245,
14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245,
14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245,
14245, 14245, 14245, 14245, 14276, 14276, 14276, 14276, 14304,
14304, 14304, 14304, 14304, 14304, 14304, 14304, 14304, 14304,
14304, 14304, 14304, 14304, 14304, 14304, 14304, 14304, 14304,
14304, 14304, 14579, 14579, 14579, 14579, 14579, 14579, 14579,
14579, 14579, 14579, 14579, 14579, 14579, 14579, 14579, 14579,
14579, 14579, 14579, 14579, 14579, 14579, 14579, 14579, 14579,
14579, 14579, 14579, 14610, 14610, 14610, 14641, 14641, 14641,
14641, 14641, 14641, 14641, 14641, 14641, 14641, 14641, 14641,
14641, 14641, 14641, 14641, 14641, 14641, 14641, 14641, 14641,
14641, 14641, 14641, 14641, 14944, 14944, 14944, 14944, 14944,
14944, 14944, 14944, 14944, 14944, 14944, 14944, 14944, 14944,
14944, 14944, 14944, 14944, 14944, 14944, 14944, 14944, 14944,
14944, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975,
14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975,
14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975,
14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975,
14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975,
14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975,
14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975,
14975, 14975, 14975, 15006, 15006, 15006, 15006, 15006, 15006,
15006, 15006, 15006, 15006, 15006, 15006, 15006, 15006, 15006,
15006, 15006, 15006, 15006, 15006, 15006, 15006, 15006, 15034
), class = "Date"), EVENT_TYPE = structure(c(5L, 5L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L,
5L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 2L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 2L, 2L, 2L, 8L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Hail", "Heavy Snow",
"Winter Storm", "Winter Weather", "Ice Storm", "Frost/Freeze",
"WINTER WEATHER", "Blizzard"), class = "factor")), .Names = c("DATE",
"EVENT_TYPE"), row.names = c(1475L, 1476L, 1477L, 1478L, 1479L,
1480L, 1481L, 1482L, 1483L, 1484L, 1485L, 1486L, 1487L, 1488L,
1489L, 1490L, 1491L, 1492L, 1493L, 1494L, 1495L, 1496L, 1497L,
1498L, 1499L, 1500L, 1501L, 1502L, 1503L, 1504L, 1505L, 1506L,
1507L, 1508L, 1509L, 1510L, 1511L, 1512L, 1513L, 1514L, 1515L,
1516L, 1519L, 1520L, 1521L, 1588L, 1589L, 1590L, 1591L, 1592L,
1593L, 1594L, 1595L, 1596L, 1597L, 1598L, 1599L, 1600L, 1601L,
1602L, 1603L, 1604L, 1605L, 1606L, 1608L, 1609L, 1610L, 1611L,
1612L, 1613L, 1614L, 1615L, 1616L, 1617L, 1618L, 1619L, 1620L,
1621L, 1622L, 1623L, 1624L, 1625L, 1626L, 1627L, 1628L, 1629L,
1630L, 1631L, 1632L, 1633L, 1634L, 1635L, 1636L, 1638L, 1642L,
1643L, 1644L, 1645L, 1646L, 1647L, 1648L, 1649L, 1650L, 1651L,
1652L, 1653L, 1654L, 1655L, 1656L, 1657L, 1658L, 1659L, 1660L,
1661L, 1662L, 1665L, 1666L, 1671L, 1672L, 1673L, 1674L, 1679L,
1680L, 1681L, 1682L, 1683L, 1684L, 1685L, 1686L, 1687L, 1688L,
1689L, 1690L, 1691L, 1692L, 1693L, 1694L, 1696L, 1697L, 1698L,
1699L, 1700L, 1701L, 1702L, 1703L, 1863L, 1864L, 1865L, 1866L,
1867L, 1868L, 1869L, 1870L, 1871L, 1872L, 1873L, 1874L, 1877L,
1878L, 1879L, 1880L, 1881L, 1882L, 1883L, 1884L, 1885L, 1886L,
1887L, 1888L, 1889L, 1890L, 1891L, 1892L, 1893L, 1894L, 1895L,
1896L, 1897L, 1898L, 1899L, 1900L, 1901L, 1902L, 1903L, 1904L,
1905L, 1906L, 1907L, 1910L, 1911L, 1916L, 1917L, 1918L, 1919L,
1920L, 1921L, 1922L, 1923L, 1924L, 1925L, 1926L, 1927L, 1928L,
1929L, 1933L, 1934L, 1935L, 1938L, 1940L, 1941L, 1942L, 1943L,
1944L, 1945L, 1946L, 1947L, 1948L, 1950L, 1951L, 1952L, 1953L,
1955L, 1956L, 1957L, 1958L, 1959L, 1960L, 1961L, 1962L, 1964L,
1965L, 1966L, 1967L, 1968L, 1969L, 1974L, 1976L, 1977L, 1978L,
1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L, 1986L, 1987L,
1988L, 1989L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L,
1998L, 2071L, 2072L, 2073L, 2074L, 2075L, 2076L, 2077L, 2078L,
2079L, 2080L, 2081L, 2082L, 2083L, 2084L, 2085L, 2086L, 2087L,
2088L, 2089L, 2090L, 2091L, 2092L, 2093L, 2094L, 2095L, 2096L,
2097L, 2098L, 2105L, 2106L, 2107L, 2108L, 2109L, 2110L, 2111L,
2112L, 2113L, 2114L, 2115L, 2116L, 2117L, 2118L, 2119L, 2122L,
2123L, 2124L, 2125L, 2126L, 2127L, 2128L, 2129L, 2130L, 2131L,
2132L, 2133L, 2134L, 2184L, 2185L, 2186L, 2187L, 2189L, 2190L,
2191L, 2192L, 2193L, 2194L, 2195L, 2196L, 2197L, 2198L, 2199L,
2200L, 2201L, 2202L, 2203L, 2204L, 2205L, 2206L, 2207L, 2208L,
2209L, 2212L, 2213L, 2214L, 2215L, 2216L, 2217L, 2218L, 2219L,
2220L, 2221L, 2222L, 2223L, 2224L, 2225L, 2226L, 2227L, 2228L,
2229L, 2230L, 2231L, 2232L, 2233L, 2234L, 2235L, 2236L, 2237L,
2238L, 2239L, 2240L, 2241L, 2242L, 2243L, 2244L, 2245L, 2246L,
2247L, 2248L, 2249L, 2250L, 2251L, 2252L, 2253L, 2254L, 2255L,
2256L, 2257L, 2258L, 2259L, 2260L, 2261L, 2262L, 2263L, 2264L,
2265L, 2266L, 2267L, 2268L, 2269L, 2270L, 2271L, 2272L, 2273L,
2274L, 2275L, 2276L, 2277L, 2278L, 2279L, 2280L, 2281L, 2282L,
2283L, 2284L, 2285L, 2286L, 2287L, 2288L, 2289L, 2290L, 2291L,
2292L, 2293L, 2294L, 2295L, 2303L, 2304L, 2305L, 2308L), class = "data.frame")
dput(model.births)
structure(list(DATE = structure(c(13514, 13545, 13573, 13604,
13634, 13665, 13695, 13726, 13757, 13787, 13818, 13848, 13879,
13910, 13939, 13970, 14000, 14031, 14061, 14092, 14123, 14153,
14184, 14214, 14245, 14276, 14304, 14335, 14365, 14396, 14426,
14457, 14488, 14518, 14549, 14579, 14610, 14641, 14669, 14700,
14730, 14761, 14791, 14822, 14853, 14883, 14914, 14944, 14975,
15006, 15034, 15065, 15095, 15126, 15156, 15187, 15218, 15248,
15279, 15309), class = "Date"), BIRTH_TOTAL = c(6250, 5833, 6570,
6227, 6858, 6735, 6933, 7291, 6385, 6466, 6198, 6221, 6341, 6051,
6444, 6396, 6781, 6583, 6820, 6803, 6531, 6510, 5627, 6135, 5976,
5515, 6208, 6261, 6520, 6509, 6834, 6616, 6489, 6318, 5730, 6040,
5667, 5459, 6162, 6212, 6221, 6194, 6469, 6380, 6342, 5981, 5853,
5925, 5979, 5414, 6070, 6085, 6242, 6438, 6506, 6459, 6260, 6158,
5754, 5801)), .Names = c("DATE", "BIRTH_TOTAL"), row.names = c(NA,
-60L), class = "data.frame")
答案 0 :(得分:2)
因此,我们在评论中讨论过,您必须比较“苹果与苹果”,因此两个数据集必须通过唯一日期进行比较。
第一种方法是给每个事件赋予相同的权重,计算它们并与“model.births”进行比较
## Aggrgating "model.weather" by date and counting events
aggmodel.weather <- aggregate(EVENT_TYPE ~ DATE, data = model.weather, length)
## Merging to "model.births" by DATE
model.births <- merge(model.births, aggmodel.weather, by = "DATE", all.x = T)
## Setting the missing events to zero
model.births[is.na(model.births$EVENT_TYPE), "EVENT_TYPE"] <- 0
## Running `ccf` funciton, notice the documentation of `ccf` which states "The lag k value returned by ccf(x, y) estimates the correlation between x[t+k] and y[t]"
ccf(model.births$BIRTH_TOTAL, model.births$EVENT_TYPE)
输出结论是确定的。有关详细信息,请参阅here
第二种方法是将“model.weather”中的每种事件与“model.birth”进行比较
## Checking the event types
table(model.weather$EVENT_TYPE)
## Hail Heavy Snow Winter Storm Winter Weather
## 0 283 127 0
##Ice Storm Frost/Freeze WINTER WEATHER Blizzard
## 16 0 0 1
## Lets try "Heavy Snow" as it seems the most frequent (doing everything as previously)
Heavy.Snow <- model.weather[model.weather$EVENT_TYPE == "Heavy Snow", ]
Heavy.Snow <- aggregate(EVENT_TYPE ~ DATE, data = Heavy.Snow, length)
model.births <- merge(model.births, Heavy.Snow, by = "DATE", all.x = T)
model.births[is.na(model.births$EVENT_TYPE.y), "EVENT_TYPE.y"] <- 0
ccf(model.births$BIRTH_TOTAL, model.births$EVENT_TYPE.y)
输出看起来几乎一样。您也可以尝试其他一些“EVENT_TYPE”。
此代码仅供参考,如需进一步分析,请参阅上面的链接。
最后,如果您想将“model.births”数据延迟9个月,您可以这样做:
model.births$BIRTH_TOTAL2 <- c(model.births$BIRTH_TOTAL[10 : (length(model.births$BIRTH_TOTAL))], rep(NA, 9))
model.births <- model.births[complete.cases(model.births), ]
"BIRTH_TOTAL2"
将是您的滞后变量