我有一个以这种方式组织的数据集:
ID Species DateTime
P1 A 2015-03-16 18:42:00
P2 A 2015-03-16 19:34:00
P3 A 2015-03-16 19:58:00
P4 A 2015-03-16 21:02:00
P5 B 2015-03-16 21:18:00
P6 A 2015-03-16 21:19:00
P7 A 2015-03-16 21:33:00
P8 B 2015-03-16 21:35:00
P9 B 2015-03-16 23:43:00
我希望在每个物种中为每个物种选择独立的图片(即图片彼此分开1小时),在此数据集中为R.
在这个例子中,对于物种A,我只想保留P1,P3和P4。 P2不会被考虑,因为它落在以P1开始的1h时段内。 P3被认为是因为其DateTime(19h58)在19h42之后下降。现在,接下来的1h时段将持续到20h58。对于物种B,只有P5和P9。
因此,在此过滤器之后,我的数据集将如下所示:
ID Species DateTime
P1 A 2015-03-16 18:42:00
P3 A 2015-03-16 19:58:00
P4 A 2015-03-16 21:02:00
P5 B 2015-03-16 21:18:00
P9 B 2015-03-16 23:43:00
有人知道如何在R中执行此操作吗?
答案 0 :(得分:1)
可能有一种更优雅的方式,但这有效:
library(dplyr)
isHourApart <- function(dt) {
min <- 0
keeps <- c()
for (d in dt) {
if (d >= min + 60 * 60) {
min <- d
keeps <- c(keeps, TRUE)
} else {
keeps <- c(keeps, FALSE)
}
}
keeps
}
df %>%
group_by(Species) %>%
filter(isHourApart(DateTime))
> df
# A tibble: 5 x 3
# Groups: Species [2]
ID Species DateTime
<chr> <fct> <dttm>
1 P1 A 2015-03-16 18:42:00
2 P3 A 2015-03-16 19:58:00
3 P4 A 2015-03-16 21:02:00
4 P5 B 2015-03-16 21:18:00
5 P9 B 2015-03-16 23:43:00
请注意,DateTime列属于POSIXct类。
答案 1 :(得分:1)
以下是dplyr
解决方案:
require(dplyr);
df %>%
arrange(Species, DateTime) %>%
group_by(Species) %>%
mutate(
DateTime = as.POSIXct(DateTime),
diff = abs(lag(DateTime) - DateTime),
diff = ifelse(is.na(diff), 0, diff),
cumdiff = cumsum(as.numeric(diff)) %/% 60,
x = abs(lag(cumdiff) - cumdiff)) %>%
filter(is.na(x) | x > 0) %>%
select(ID, Species, DateTime) %>%
ungroup() %>%
as.data.frame()
# ID Species DateTime
#1 P1 A 2015-03-16 18:42:00
#2 P3 A 2015-03-16 19:58:00
#3 P4 A 2015-03-16 21:02:00
#4 P5 B 2015-03-16 21:18:00
#5 P9 B 2015-03-16 23:43:00
df <- read.table(text = "ID Species DateTime
P1 A '2015-03-16 18:42:00'
P2 A '2015-03-16 19:34:00'
P3 A '2015-03-16 19:58:00'
P4 A '2015-03-16 21:02:00'
P5 B '2015-03-16 21:18:00'
P6 A '2015-03-16 21:19:00'
P7 A '2015-03-16 21:33:00'
P8 B '2015-03-16 21:35:00'
P9 B '2015-03-16 23:43:00'", header = T);
答案 2 :(得分:1)
以下是使用 enter code here# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from django.shortcuts import render, redirect
from django.contrib import messages
import bcrypt
from .models import *
# Create your views here.
def index(request):
#appt=Appt.objects
context ={
#"appts": appt
}
return render(request,'index.html',context)
def register(request):
errors = User.objects.validate(request.POST)
#print 'this process works', request.POST
if len(errors) > 0:
for error in errors:
messages.error(request, error)
return redirect("/")
else:
hashpwd = bcrypt.hashpw(request.POST["password"].encode(), bcrypt.gensalt())
newuser = User.objects.create(
first_name=request.POST['first_name'],
last_name=request.POST['last_name'],
email=request.POST['email'],
password=hashpwd)
request.session['userid'] = newuser.id
request.session['name'] = newuser.first_name
print "session info", newuser.id, newuser.first_name
return redirect("/success")
def login(request):
# print postData['email']
errors = User.objects.login(request.POST)
if len(errors) > 0:
for error in errors:
messages.error(request, error)
return redirect("/")
else:
user = User.objects.filter(email=request.POST['email'])[0]
request.session['userid'] = user.id
request.session['name'] = user.first_name
return redirect("/success")
def success(request):
user = request.session['userid']
return render(request, 'appointments.html')
def logout(request):
request.session.clear()
#print 'goodbye'
return redirect('/')
def new(request):
request.session
return render(request,'newappoint.html')
def delete(request):
return (request,'destroy')
def edit(request):
return render(request,'edit.html')
def create(request):
return redirect('edit.html')
def update(request):
return render(request,'newappoint.html')
:
data.table
答案 3 :(得分:0)
我们可以简单地创建一个间隔为60分钟的新列,然后保留每个Species
的第一次出现。
df %>%
mutate(by60 = cut(DateTime, "60 min")) %>%
group_by(Species, by60) %>%
slice(1)
输出1
# A tibble: 5 x 4
# Groups: Species, by60 [5]
ID Species DateTime by60
<chr> <chr> <dttm> <fct>
1 P1 A 2015-03-16 18:42:00 2015-03-16 18:42:00
2 P3 A 2015-03-16 19:58:00 2015-03-16 19:42:00
3 P4 A 2015-03-16 21:02:00 2015-03-16 20:42:00
4 P5 B 2015-03-16 21:18:00 2015-03-16 20:42:00
5 P9 B 2015-03-16 23:43:00 2015-03-16 23:42:00
如果我们想删除虚拟列:
df %>%
mutate(by60 = cut(DateTime, "60 min")) %>%
group_by(Species, by60) %>%
slice(1) %>%
ungroup() %>%
select(-by60)
<强>输出2 强>
# A tibble: 5 x 3
ID Species DateTime
<chr> <chr> <dttm>
1 P1 A 2015-03-16 18:42:00
2 P3 A 2015-03-16 19:58:00
3 P4 A 2015-03-16 21:02:00
4 P5 B 2015-03-16 21:18:00
5 P9 B 2015-03-16 23:43:00