我对R很新,还在努力学习。我希望获取单个事件的列表,其中包含事件年份和事件位置的信息,并将其转换为一个数据集,该数据集计算每年在一个位置发生事件的次数。以下是我当前数据集的示例:
Event|Year |Location
-----|-----|--------
1| 2001| A
2| 2003| B
3| 2001| B
4| 2003| A
5| 2002| C
6| 2001| B
7| 2002| A
8| 2003| C
9| 2002| A
10| 2002| A
我正在寻找的数据集,因此我可以对其进行回归:
Year|Location| Number of Occurrences
----|--------|----------------------
2001| A| 1
2002| A| 3
2003| A| 1
2001| B| 2
2002| B| 0
2003| B| 1
2001| C| 0
2002| C| 1
2003| C| 1
答案 0 :(得分:2)
最简单的方法是使用table
计算频率,然后将此结果转换为data.frame:
data.frame(table(df1$Year, df1$Location))
Var1 Var2 Freq
1 2001 A 1
2 2002 A 3
3 2003 A 1
4 2001 B 2
5 2002 B 0
6 2003 B 1
7 2001 C 0
8 2002 C 1
9 2003 C 1
要添加变量名称,您可以使用setNames
,如下所示:
setNames(data.frame(table(df1$Year, df1$Location)),
c("Year", "Location", "Event"))
答案 1 :(得分:1)
我们可以在创建列'Number_of_Occurrences'为1,complete
为0,按'年''分组后,使用fill
创建'{Event'和'Year'的唯一级别组合。位置'并获取'n'的sum
library(tidyverse)
df1 %>%
select(-Event) %>%
mutate(Number_of_Occurrences = 1) %>%
complete(Year, Location, fill = list(Number_of_Occurrences = 0)) %>%
group_by(Year, Location) %>%
summarise(Number_of_Occurrences = sum(Number_of_Occurrences)) %>%
arrange(Location)
# A tibble: 9 x 3
# Groups: Year [3]
# Year Location Number_of_Occurrences
# <int> <chr> <dbl>
#1 2001 A 1
#2 2002 A 3
#3 2003 A 1
#4 2001 B 2
#5 2002 B 0
#6 2003 B 1
#7 2001 C 0
#8 2002 C 1
#9 2003 C 1
df1 <- structure(list(Event = 1:10, Year = c(2001L, 2003L, 2001L, 2003L,
2002L, 2001L, 2002L, 2003L, 2002L, 2002L), Location = c("A",
"B", "B", "A", "C", "B", "A", "C", "A", "A")), .Names = c("Event",
"Year", "Location"), class = "data.frame", row.names = c(NA,
-10L))
答案 2 :(得分:1)
这是一个解决方案:
require(tidyr)
result <- aggregate(Event~Year+Location,function(x)length(unique(x)),data=df)
result <- complete(Year,Location,fill=list(Event=0),data=result)
result[order(result$Location),]
输出:
Year Location Event
1 2001 A 1
2 2002 A 3
3 2003 A 1
4 2001 B 2
5 2002 B 0
6 2003 B 1
7 2001 C 0
8 2002 C 1
9 2003 C 1