Question

我有一个150,000 +行的数据框，但这是我想要实现的一个例子：

 TIME_REAL   HR Behaviour
 21:15:00   54  Eupnoea
 21:15:01   107 Eupnoea
 21:15:02   118 Eupnoea
 21:15:03   75  Eupnoea
 21:15:04   94  Eupnoea
 21:15:05   57  Eupnoea
 21:15:06   106 Eupnoea
 21:15:07   121 Eupnoea
 21:15:08   104 Eupnoea
 21:15:09   73  Eupnoea
 21:15:10   114 Apnoea
 21:15:11   108 Apnoea
 21:15:12   121 Apnoea
 21:15:13   117 Apnoea
 21:15:14   110 Apnoea
 21:15:15   38  Eupnoea
 21:15:16   120 Eupnoea
 21:15:17   118 Eupnoea
 21:15:18   82  Eupnoea
 21:15:19   107 Eupnoea
 21:15:20   44  Apnoea

我正在尝试计算行为事件的持续时间（因子） - 所以第一次eupnoea事件将是9秒长，然后是4秒的呼吸暂停事件等。理想情况下我想要一个单独的表或列事件发生的时间和行为事件的持续时间。我曾尝试使用dplyr包，但没有取得任何成功。我还想计算每次行为事件发生时的平均人力资源......无论如何都要在R ??中做到这一点

提前谢谢！

Answer 1

library(tidyverse)

tbl <- tribble(
  ~TIME_REAL,   ~HR, ~Behaviour,
  "21:15:00",   54,  "Eupnoea",
  "21:15:01",   107, "Eupnoea",
  "21:15:02",   118, "Eupnoea",
  "21:15:03",   75,  "Eupnoea",
  "21:15:04",   94,  "Eupnoea",
  "21:15:05",   57,  "Eupnoea",
  "21:15:06",   106, "Eupnoea",
  "21:15:07",   121, "Eupnoea",
  "21:15:08",   104, "Eupnoea",
  "21:15:09",   73,  "Eupnoea",
  "21:15:10",   114, "Apnoea",
  "21:15:11",   108, "Apnoea",
  "21:15:12",   121, "Apnoea",
  "21:15:13",   117, "Apnoea",
  "21:15:14",   110, "Apnoea",
  "21:15:15",   38,  "Eupnoea",
  "21:15:16",   120, "Eupnoea",
  "21:15:17",   118, "Eupnoea",
  "21:15:18",   82,  "Eupnoea",
  "21:15:19",   107, "Eupnoea",
  "21:15:20",   44,  "Apnoea"
)

myle <- rle(tbl$Behaviour)
tbl %>% 
  mutate(code = rep(seq_along(myle$values), myle$lengths)) %>%
  group_by(Behaviour, code) %>%
  summarise(N = n(), mean = mean(HR)) %>% 
  arrange(code)

Answer 2

在dplyr rleid data.table的帮助下，rleid如何做到这一点。我使用as.POSIXct，因为它是添加组号的简便方法。我还使用library(dplyr) df %>% mutate(TIME_REAL=as.POSIXct(TIME_REAL,format="%H:%M:%S"), behaviour_number=data.table::rleid(Behaviour))%>% group_by(behaviour_number)%>% summarise(behaviour=max(Behaviour),elapsed=max(TIME_REAL)-min(TIME_REAL), HR_avg=mean(HR,na.rm=TRUE)) behaviour_number behaviour elapsed HR_avg <int> <chr> <time> <dbl> 1 1 Eupnoea 9 secs 90.9 2 2 Apnoea 4 secs 114.0 3 3 Eupnoea 4 secs 93.0 4 4 Apnoea 0 secs 44.0将时间列转换为时间对象，这样更容易进行操作。

df <- read.table(text="TIME_REAL   HR Behaviour
 21:15:00   54  Eupnoea
                 21:15:01   107 Eupnoea
                 21:15:02   118 Eupnoea
                 21:15:03   75  Eupnoea
                 21:15:04   94  Eupnoea
                 21:15:05   57  Eupnoea
                 21:15:06   106 Eupnoea
                 21:15:07   121 Eupnoea
                 21:15:08   104 Eupnoea
                 21:15:09   73  Eupnoea
                 21:15:10   114 Apnoea
                 21:15:11   108 Apnoea
                 21:15:12   121 Apnoea
                 21:15:13   117 Apnoea
                 21:15:14   110 Apnoea
                 21:15:15   38  Eupnoea
                 21:15:16   120 Eupnoea
                 21:15:17   118 Eupnoea
                 21:15:18   82  Eupnoea
                 21:15:19   107 Eupnoea
                 21:15:20   44  Apnoea",header=TRUE,stringsAsFactors=FALSE)

数据

#include "stdafx.h" #include <iostream> #include <vector> using namespace std; int main() { using MyVector = vector<int>; MyVector newVector = { 0,1,2 }; newVector.push_back(3); newVector.push_back(4); MyVector::const_iterator iter = newVector.cbegin() + 1; newVector.insert(iter, 5); newVector.erase(iter); for (auto iter = newVector.begin(); iter != newVector.end(); ++iter) { cout << *iter << endl; } return 0; }

Answer 3

您可以尝试以下操作：

dff$TIME_REAL <- as.POSIXct(strptime(dff$TIME_REAL, '%H:%M:%S'))

make_splitter <- function(col_vals) {
  rle_lengths <- rle(as.character(col_vals))$lengths
  rep(1:length(rle_lengths), rle_lengths)
}

dff %>%
group_by(splitter = make_splitter(Behaviour), Behaviour) %>%
summarise(Average_HR = mean(HR),
        Start_Time = strftime(head(TIME_REAL, 1), '%H:%M:%S'),
        End_Time = strftime(tail(TIME_REAL, 1), '%H:%M:%S'),
        Duration = difftime(tail(TIME_REAL, 1), head(TIME_REAL, 1)))

首先，创建一个有助于定义分割数据帧的方式的函数。在这里，我使用rle函数和一些复制来获得理想的列。

假设您的当前数据框名为dff，您可以将TIME_REAL列强制转换为as.POSIXct个对象，然后再对其进行任何计算。随后，您可以使用dplyr按拆分列和Behavior列进行分组，然后使用summarise函数获取平均HR和时差

应该屈服：

  Behaviour Average_HR Start_Time End_Time Duration
     <fctr>      <dbl>      <chr>    <chr>   <time>
1   Eupnoea       90.9   21:15:00 21:15:09   9 secs
2    Apnoea      114.0   21:15:10 21:15:14   4 secs
3   Eupnoea       93.0   21:15:15 21:15:19   4 secs
4    Apnoea       44.0   21:15:20 21:15:20   0 secs

我希望这会有所帮助。

Answer 4

使用日期时，我建议您去lubridate。

在这里，您需要构建一个完整的日期格式才能使用日期。为了这个例子，让我们假设今天是今天。

library(tidyverse)
try <- tribble(
  ~TIME_REAL, ~Behaviour,
  "21:15:00", "Eupnoea",
  "21:15:03", "Eupnoea",
  "21:15:04", "Eupnoea",
  "21:15:09", "Eupnoea",
  "21:15:10", "Apnoea",
  "21:15:15", "Apnoea",
  "21:15:17", "Apnoea",
  "21:15:18", "Apnoea"
)
library(lubridate)
try %>%
  mutate(TIME_REAL = paste(today(), TIME_REAL)) %>%
  mutate(TIME_REAL = ymd_hms(TIME_REAL)) %>%
  group_by(Behaviour) %>%
  summarize(time = max(TIME_REAL) - min(TIME_REAL))


# A tibble: 2 x 2
  Behaviour   time
      <chr> <time>
1    Apnoea 8 secs
2   Eupnoea 9 secs

这里两个mutate调用将日期转换为ISO8601。然后你可以group_by并做基本数学。

希望这可以提供帮助

科林

计算R中经过的“时间”，其中时间取决于因子

4 个答案: