从数据集中获取日期

时间:2020-06-02 16:43:35

标签: r

 df <- read.csv('https://raw.githubusercontent.com/ulklc/covid19- 
 timeseries/master/countryReport/raw/rawReport.csv')
 df$countryName = as.character(df$countryName)

我处理了数据集。

我如何找到一天中报告最多患者,死亡和康复情况的国家?

示例:

2020年6月1日,报告死亡人数最多的国家,6月1日,报告此案的国家,6月1日报告了最严重的病例,

1 个答案:

答案 0 :(得分:1)

下面的代码使用dplyr R包创建一个名为records的数据框,其中包含所需的数据。通过在R或RStudio中运行dplyr,确保已安装install.package("dplyr")

## call the dplyr library
library(dplyr)
## read in your data to R
df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv', stringsAsFactors = FALSE)
## set the date you wish to query max records for
set.date <- "2020-06-01"
## copy the data to preserve the original
df1 <- df 
## filter the records to only those that match the specified date
df1 <- filter(df1, as.Date(date, "%Y/%m/%d") == as.Date(set.date))
## determine which country had the most confirmed on the specified day
max.confirmed <- df1[which.max(df1$confirmed),]
## format the record to identify it as the record with most confirmed
max.confirmed$confirmed <- paste0("**",max.confirmed$confirmed,"**")
## determine which country had the most deaths on the specified day
max.deaths <- df1[which.max(df1$death),]
## format the record to identify it as the record with most deaths
max.deaths$death <- paste0("**",max.deaths$death,"**")
## determine which country had the most recovered on the specified day
max.recovered <- df1[which.max(df1$recovered),]
## format the record to identify it as the record with most recovered
max.recovered$recovered <- paste0("**",max.recovered$recovered,"**")
## create the reocrds data frame to contain your max records
records <- rbind(max.confirmed, max.deaths, max.recovered)

您可以通过将"2020-06-01"更改为查询最大死亡并恢复的日期来更新希望选择的日期。确保使用"YYYY-MM-DD"格式。

或者,您可以使用readline()函数要求用户输入他们要查询最大数据的日期,而不是手动更新代码。

已添加(基于评论) 如果要使用今天的数据(或者如果今天的数据不可用,则为最新数据),可以使用以下代码:

## call the dplyr library
library(dplyr)
## read the data into R
df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv', stringsAsFactors = FALSE)
## determine the max date contained within the data
max.date <- df[which.max(as.Date(df$day)),"day"]
## copy the data to preserve original
df1 <- df 
## filter the data to only entries from the max day
df1 <- filter(df1, as.Date(date, "%Y/%m/%d") == as.Date(max.date))
## determine the entry with the most deaths
max.deaths <- df1[which.max(df1$death),]
## format the number of deaths as given in the example
max.deaths$death <- paste0("**",max.deaths$death,"**")
## determine the entry with the most recovered
max.recovered <- df1[which.max(df1$recovered),]
## format the number recovered to match the format of the example
max.recovered$recovered <- paste0("**",max.recovered$recovered,"**")
## create a data frame containing our max death and max recovered entries
max.records <- rbind(max.deaths, max.recovered)
## attach a column with the max date which corresponds to the date of the entries selected
max.records$date <- max.date
## organize the data as shown in the example
max.records <- select(max.records, c("day","countryName","death","recovered"))

我希望这会有所帮助!