国家和大洲在此数据集中。
df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv')
#This data set contains countries and population information.
df8 <- read.csv ('https://raw.githubusercontent.com/hirenvadher954/Worldometers-Scraping/master/countries.csv')
library(dplyr)
library(stringr
df %>%
left_join(df8, by = c("countryName" = "country_name")) %>%
mutate(population = as.numeric(str_remove_all(population, ","))) %>%
group_by(countryName) %>%
slice_tail(1) %>%
group_by(region) %>%
summarize(population = sum(population, na.rm = TRUE))
df%>%left_join(df8,由= c(countryName =“ country_name”))%>%错误: 未找到函数“%>%” 出现此错误。您能解释一下原因并提供解决方案吗?
如何将数据集1中的大陆信息与数据集2中的人口信息结合起来?
例如:亚洲28亿,非洲8亿,欧洲10亿
答案 0 :(得分:1)
您在这里遇到了几个问题:
1)使用read.csv
读取数据时,国家/地区被视为因素;您可以使用参数stringsAsFactors = FALSE
2)slice_tail
不知道这是哪里来的;您正在寻找dplyr::slice
吗?
df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv',
stringsAsFactors = FALSE)
#This data set contains countries and population information.
df8 <- read.csv ('https://raw.githubusercontent.com/hirenvadher954/Worldometers-Scraping/master/countries.csv',
stringsAsFactors = FALSE)
library(dplyr)
library(stringr
df %>%
left_join(df8, by = c("countryName" = "country_name")) %>%
mutate(population = as.numeric(str_remove_all(population, ","))) %>%
group_by(countryName) %>%
slice(1) %>%
group_by(region) %>%
summarize(population = sum(population, na.rm = TRUE))
这给您:
df
## # A tibble: 5 x 2
## region population
## <chr> <dbl>
## 1 Africa 1304908713
## 2 Americas 1019607512
## 3 Asia 4592311527
## 4 Europe 738083720
## 5 Oceania 40731992