背景:我正在分析有关人们如何花时间的数据。个人填写日记,指定在给定的24小时内每10分钟的时间段(24小时内有144个十分钟的时段)。我想知道人们在24小时内在特定地方度过的时间。
我的数据格式很长。个人属于家庭,每个家庭(家庭连续)的许多人被包括在数据(人)中。例如,在下面的数据集中,为居住在(序列号)“15”的家庭中的2个人提供数据。
为了使我的帖子在下面的例子中更短,每个人有12行(代表早上6点 - 早上8点),但在我的实际数据中,每个人有144行(即144集10分钟= 24小时)。
Household Person Time Location
15 1 06:00-06:10 Home
15 1 06:10-06:20 Home
15 1 06:20-06:30 Park
15 1 06:30-06:40 Park
15 1 06:40-06:50 Park
15 1 06:50-07:00 Park
15 1 07:00-07:10 Park
15 1 07:10-07:20 Park
15 1 07:20-07:30 Park
15 1 07:30-07:40 Park
15 1 07:40-07:50 Work
15 1 07:50-08:00 Work
15 2 06:00-06:10 Park
15 2 06:10-06:20 Park
15 2 06:20-06:30 Park
15 2 06:30-06:40 Park
15 2 06:40-06:50 Park
15 2 06:50-07:00 Park
15 2 07:00-07:10 Home
15 2 07:10-07:20 Home
15 2 07:20-07:30 Home
15 2 07:30-07:40 Home
15 2 07:40-07:50 Home
15 2 07:50-08:00 Home
18 1 06:00-06:10 Home
18 1 06:10-06:20 Home
18 1 06:20-06:30 Home
18 1 06:30-06:40 Home
18 1 06:40-06:50 Home
18 1 06:50-07:00 Home
18 1 07:00-07:10 Park
18 1 07:10-07:20 Park
18 1 07:20-07:30 Park
18 1 07:30-07:40 Park
18 1 07:40-07:50 Park
18 1 07:50-08:00 Park
我的问题:假设我想知道每个人在公园花多长时间。基本上,我需要计算字符串'park'在每个人的位置向量中出现的次数(然后我可以* 10来查找总分钟数)。你会怎么做?
答案 0 :(得分:0)
您可以使用data.table
计算每项活动数
df <- read.table(text = "Household Person Time Location
15 1 06:00-06:10 Home
15 1 06:10-06:20 Home
15 1 06:20-06:30 Park
15 1 06:30-06:40 Park
15 1 06:40-06:50 Park
15 1 06:50-07:00 Park
15 1 07:00-07:10 Park
15 1 07:10-07:20 Park
15 1 07:20-07:30 Park
15 1 07:30-07:40 Park
15 1 07:40-07:50 Work
15 1 07:50-08:00 Work
15 2 06:00-06:10 Park
15 2 06:10-06:20 Park
15 2 06:20-06:30 Park
15 2 06:30-06:40 Park
15 2 06:40-06:50 Park
15 2 06:50-07:00 Park
15 2 07:00-07:10 Home
15 2 07:10-07:20 Home
15 2 07:20-07:30 Home
15 2 07:30-07:40 Home
15 2 07:40-07:50 Home
15 2 07:50-08:00 Home
18 1 06:00-06:10 Home
18 1 06:10-06:20 Home
18 1 06:20-06:30 Home
18 1 06:30-06:40 Home
18 1 06:40-06:50 Home
18 1 06:50-07:00 Home
18 1 07:00-07:10 Park
18 1 07:10-07:20 Park
18 1 07:20-07:30 Park
18 1 07:30-07:40 Park
18 1 07:40-07:50 Park
18 1 07:50-08:00 Park", header = T)
df <- data.table(df)
df[, .(CountActivity = .N), by = .(Person,Location)]
Person Location CountActivity
1: 1 Home 8
2: 1 Park 14
3: 1 Work 2
4: 2 Park 6
5: 2 Home 6
答案 1 :(得分:0)
您可以使用sqldf
包:
library(sqldf)
sqldf("SELECT Person, Location, COUNT(*) as Freq
FROM df
GROUP BY Person, Location")
## Person Location Freq
## 1 1 Home 8
## 2 1 Park 14
## 3 1 Work 2
## 4 2 Home 6
## 5 2 Park 6
<强> 数据:的强>
df <- structure(list(Household = c(15L, 15L, 15L, 15L, 15L, 15L, 15L,
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L,
15L, 15L, 15L, 15L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L,
18L, 18L, 18L), Person = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Time = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L), .Label = c("06:00-06:10", "06:10-06:20",
"06:20-06:30", "06:30-06:40", "06:40-06:50", "06:50-07:00", "07:00-07:10",
"07:10-07:20", "07:20-07:30", "07:30-07:40", "07:40-07:50", "07:50-08:00"
), class = "factor"), Location = structure(c(1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("Home", "Park", "Work"), class = "factor")), .Names = c("Household",
"Person", "Time", "Location"), row.names = c(NA, 36L), class = "data.frame")