Question

我需要创建一个表格，列出每个国家在最近一次观察到的日期的案件数，并打印出国家名称和案件数最多的前5个国家/地区的案件数。

这是数据外观的一个示例：

    Country       Date Confirmed Recovered Deaths
1   Algeria 2020-01-22         0         0      0
2   Algeria 2020-01-23         0         0      0
3   Algeria 2020-01-24         0         0      0
4   Algeria 2020-01-25         0         0      0
5   Algeria 2020-01-26         0         0      0
6   Algeria 2020-01-27         0         0      0
7   Algeria 2020-01-28         0         0      0
8   Algeria 2020-01-29         0         0      0

(There are other countries as well)

更新：

因此，我使用它来按顺序获取日期和确诊病例，但我仍在努力仅打印出前5个国家/地区：

by_country_top5 = Africa_covid %>% 
mutate(Date=as.Date(Date, '%m/%d/%Y')) %>% 
group_by(Country) %>% 
arrange(desc(Date),desc(Confirmed)) %>%
select(Country,Date,Confirmed)
by_country_top5

Answer 1

如果数据框为df，而国家/地区列为country，则案例称为cases 日期列称为date：

library(dplyr)
 topDates = df$countries %>% 
             unique %>% 
               lapply(function(x){
                 df$date[df$countries == x,] %>% which.max
               }) %>% unlist 

Top5 = df[topDates,] %>% arrange(-date)
Top5

Answer 2

排序和分组后，可以使用glob.glob('yourdir/Data*.txt,recursive=True)来获取每个组的前n行，例如：

slice

最后观察日期

2 个答案: