个人事件每年进入频率

时间:2017-09-03 04:58:07

标签: r dataframe frequency

我对R很新,还在努力学习。我希望获取单个事件的列表,其中包含事件年份和事件位置的信息,并将其转换为一个数据集,该数据集计算每年在一个位置发生事件的次数。以下是我当前数据集的示例:

Event|Year |Location
-----|-----|--------
    1| 2001| A
    2| 2003| B
    3| 2001| B
    4| 2003| A
    5| 2002| C
    6| 2001| B
    7| 2002| A
    8| 2003| C
    9| 2002| A
   10| 2002| A

我正在寻找的数据集,因此我可以对其进行回归:

Year|Location| Number of Occurrences 
----|--------|----------------------
2001|       A| 1
2002|       A| 3
2003|       A| 1
2001|       B| 2
2002|       B| 0
2003|       B| 1
2001|       C| 0
2002|       C| 1
2003|       C| 1

3 个答案:

答案 0 :(得分:2)

最简单的方法是使用table计算频率,然后将此结果转换为data.frame:

data.frame(table(df1$Year, df1$Location))
  Var1 Var2 Freq
1 2001    A    1
2 2002    A    3
3 2003    A    1
4 2001    B    2
5 2002    B    0
6 2003    B    1
7 2001    C    0
8 2002    C    1
9 2003    C    1

要添加变量名称,您可以使用setNames,如下所示:

setNames(data.frame(table(df1$Year, df1$Location)),
         c("Year", "Location", "Event"))

答案 1 :(得分:1)

我们可以在创建列'Number_of_Occurrences'为1,complete为0,按'年''分组后,使用fill创建'{Event'和'Year'的唯一级别组合。位置'并获取'n'的sum

library(tidyverse)
df1 %>%
   select(-Event) %>%
   mutate(Number_of_Occurrences  = 1) %>% 
   complete(Year, Location, fill = list(Number_of_Occurrences = 0)) %>% 
   group_by(Year, Location) %>% 
   summarise(Number_of_Occurrences = sum(Number_of_Occurrences)) %>% 
   arrange(Location)
# A tibble: 9 x 3
# Groups:   Year [3]
#   Year Location Number_of_Occurrences
#  <int>    <chr>                 <dbl>
#1  2001        A                     1
#2  2002        A                     3
#3  2003        A                     1
#4  2001        B                     2
#5  2002        B                     0
#6  2003        B                     1
#7  2001        C                     0
#8  2002        C                     1
#9  2003        C                     1

数据

df1 <- structure(list(Event = 1:10, Year = c(2001L, 2003L, 2001L, 2003L, 
 2002L, 2001L, 2002L, 2003L, 2002L, 2002L), Location = c("A", 
"B", "B", "A", "C", "B", "A", "C", "A", "A")), .Names = c("Event", 
"Year", "Location"), class = "data.frame", row.names = c(NA, 
-10L))

答案 2 :(得分:1)

这是一个解决方案:

require(tidyr)
result <- aggregate(Event~Year+Location,function(x)length(unique(x)),data=df)
result <- complete(Year,Location,fill=list(Event=0),data=result)
result[order(result$Location),]

输出:

   Year Location Event
1  2001        A     1
2  2002        A     3
3  2003        A     1
4  2001        B     2
5  2002        B     0
6  2003        B     1
7  2001        C     0
8  2002        C     1
9  2003        C     1