Question

我有一张有挖掘数据的表。它列出了功能项和相应的日期范围。像这样：

feature_id   object_type_id    amount    date_id    chronology
156          46                3          3          2300-2200
156          46                3          4          2200-2100
156          46                3          5          2100-2000
274          37                1          4          2200-2100
274          37                1          5          2100-2000

如您所见，该表非常多余。我想用一种方式清理它，即每个feature_id每个object_type_id仅给我一行，并用开始和停止时间替换所有的年表和date_id麻烦。例如：

feature_id    object_type_id    amount   start_chronology    stop_chronology
156           46                3        2300                2000

如何存档？我感到困惑和迷茫。

Answer 1

使用 tidyverse 软件包很容易：

df <- read.table(text = 'feature_id   object_type_id    amount    date_id    chronology
156          46                3          3          2300-2200
                 156          46                3          4          2200-2100
                 156          46                3          5          2100-2000
                 274          37                1          4          2200-2100
                 274          37                1          5          2100-2000', header = T)

library(tidyverse)

df.new <- df %>% 
  separate(chronology, c('start', 'end')) %>% 
  group_by(feature_id, object_type_id) %>% 
  summarize(
    amount = unique(amount),
    start_chronology = max(start),
    stop_chronology = min(end)
  )

  feature_id object_type_id amount start_chronology stop_chronology
       <int>          <int>  <int> <chr>            <chr>          
1        156             46      3 2300             2000           
2        274             37      1 2200             2000

Answer 2

假设每个 feature_id 和 object_type_id ，考虑在within中用连字符分隔 chronology 列，然后调用{{1} }，您传递两个列以运行两个函数，最后进行最后的列清理。

aggregate

从表中的多行中提取范围并合并为一个

2 个答案: