Question

我有一个这样的数据框：

df = data.frame(main_name = c("google","yahoo","google","amazon","yahoo","google"),
                volume = c(32,43,412,45,12,54))

我想将其排序为main_name，例如

目的是要知道从哪一行开始有特定的短语，直到将哪一个短语用于for循环中。

main_name volume
amazon     45
google     32
google     412
google     54
yahoo      43
yahoo      12

在其中不需要任何“自动”即可知道特定短语。只是要检查它是否已更改并知道开始和结束行号？

amazon [1]
google [2:4]
yahoo  [5:6]

Answer 1

使用tidyverse：

df%>%
   arrange(main_name)%>%
   mutate(row=row_number())%>%
   group_by(main_name)%>%
   summarise(start=first(row),
             end=last(row))%>%
   mutate(res=glue::glue("[{start}:{end}]"))
# A tibble: 3 x 4
  main_name start   end res  
  <fct>     <int> <int> <chr>
1 amazon        1     1 [1:1]
2 google        2     4 [2:4]
3 yahoo         5     6 [5:6]

Answer 2

这是使用rle

的替代基础R解决方案

with(rle(as.character(df$main_name)), setNames(mapply(
    function(x, y) sprintf("[%s:%s]", x, y),
    cumsum(lengths) - lengths + 1, cumsum(lengths)), values))
# amazon  google   yahoo
#"[1:1]" "[2:4]" "[5:6]"

样本数据

df <- read.table(text =
"main_name volume
amazon     45
google     32
google     412
google     54
yahoo      43
yahoo      12", header = T)

Answer 3

这是另一个base R选项

with(df, tapply(seq_along(main_name), main_name, FUN = 
  function(x) do.call(sprintf, c(fmt = "[%d:%d]", as.list(range(x))))))
#  amazon  google   yahoo 
# "[1:1]" "[2:4]" "[5:6]"

自动检测短语的开始和结束行数

3 个答案:

样本数据