我的这个表中充满了字符和数字,并且只想拥有前3个频率,加上他们自己的变量。根据图像,我想有一个表的结果只包括AZ 520,然后是AE 488,然后是AU 399。
Var1 Freq
1 AE 488
2 AR 12
3 AU 399
4 AW 56
5 AZ 520
6 BA 2
7 BB 84
8 BG 246
9 BH 85
10 BI 6
as.data.frame(table(training.data.raw$destinationcountry))
答案 0 :(得分:2)
按照以下方式重新创建数据,假设列名为name
和value
:
training.data.raw <- data_frame(name = c("IN", "IS", "IT", "JO", "JP", "KZ", "MA", "MZ", "NG", "NO", "NZ", "PE", "PH", "PR", "RO", "RU", "SA", "SE", "SY", "TM", "TN", "TR", "UK", "US", "WS"),
value = c(999, 1, 1885, 1098, 2, 584, 858, 11, 10, 522, 193, 29, 2, 1, 1603, 353, 6, 2, 4, 33, 228, 3201, 852, 1363, 1));
您可以使用top_n
包中的dplyr
功能轻松获得所需的结果(帮助文件?top_n
中的详细信息):
library(dplyr);
top_3 <- top_n(x=training.data.raw, n=3);
top_3;
基于评论的编辑:如果你有字符因素而不是常规字符向量,你可以mutate
首先使用字符:
training.data.characters <- mutate(training.data.raw, name = as.character(name));
# Now top_n() will take it
# Can also explicity state wt argument to tell it to sort by value
top_3 <- top_n(x=training.data.characters, n=3, wt=value);
top_3;