Question

我现在有一个网址列表的数据框，我试图根据频率找到前10个网址。这就是我的意思，

    +------------+
    |urls        |
    +------------+
    |google.com  |
    |linkedin.com|
    |yahoo.com   |
    |google.com  |
    |yahoo.com   |
    +------------+

我试图添加一个freq列，但我似乎无法得到它。我试过计数（df，＆＃34; url＆＃34;）但它只给了我没有像这样的网址的频率，

    +----+
    |freq|
    +----+
    |2   |
    |1   |
    |2   |
    |2   |
    |2   |
    +----+

我可以知道如何获得这样的数据框，

    +---------------+------------+
    |urls           |   freq     |
    +---------------+------------+
    |google.com     |   2        |
    |linkedin.com   |   1        |
    |yahoo.com      |   2        |
    |google.com     |   2        |
    |yahoo.com      |   2        |      
    +---------------+------------+

我还需要排在前10位吗？

Answer 1

表返回网址的频率。然后你可以减少排序并选择前10个。

sort(table(df$urls), decreasing = T)[1:10]

如果您想使用网址名称

names(sort(table(df$urls), decreasing = T)[1:10])

Answer 2

这是一个tidyverse解决方案。使用group_by和n获取每个网址的计数。然后使用arrange订购行。

library('tidyverse')

df <- tibble(urls = c('google.com ', 'linkedin.com', 'yahoo.com ', 'google.com ', 'yahoo.com'))

df %>%
  group_by(urls) %>%
  mutate(freq = n()) %>%
  arrange(desc(freq)) %>%
  head(10)
#> # A tibble: 5 x 2
#> # Groups:   urls [4]
#>           urls  freq
#>          <chr> <int>
#> 1   google.com     2
#> 2   google.com     2
#> 3 linkedin.com     1
#> 4    yahoo.com     1
#> 5    yahoo.com     1

按R中数据框中的出现或频率排序？

2 个答案: