Question

国家和大洲在此数据集中。

df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv')

#This data set contains countries and population information.

df8 <- read.csv ('https://raw.githubusercontent.com/hirenvadher954/Worldometers-Scraping/master/countries.csv')


library(dplyr)
library(stringr

df %>% 
    left_join(df8, by = c("countryName" = "country_name")) %>% 
    mutate(population = as.numeric(str_remove_all(population, ","))) %>% 
    group_by(countryName) %>%
    slice_tail(1) %>%
    group_by(region) %>% 
    summarize(population = sum(population, na.rm = TRUE))

df％>％left_join（df8，由= c（countryName =“ country_name”））％>％错误： 未找到函数“％>％” 出现此错误。您能解释一下原因并提供解决方案吗？

如何将数据集1中的大陆信息与数据集2中的人口信息结合起来？

例如：亚洲28亿，非洲8亿，欧洲10亿

Answer 1

您在这里遇到了几个问题：

1）使用read.csv读取数据时，国家/地区被视为因素；您可以使用参数stringsAsFactors = FALSE

来解决此问题

2）slice_tail不知道这是哪里来的；您正在寻找dplyr::slice吗？


df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv',
                stringsAsFactors = FALSE)

#This data set contains countries and population information.

df8 <- read.csv ('https://raw.githubusercontent.com/hirenvadher954/Worldometers-Scraping/master/countries.csv',
                 stringsAsFactors = FALSE)


library(dplyr) 
library(stringr

        df %>% 
          left_join(df8, by = c("countryName" = "country_name")) %>% 
          mutate(population = as.numeric(str_remove_all(population, ","))) %>% 
          group_by(countryName) %>%
          slice(1) %>%
          group_by(region) %>% 
          summarize(population = sum(population, na.rm = TRUE))

这给您：

df
## # A tibble: 5 x 2
##   region   population
##   <chr>         <dbl>
## 1 Africa   1304908713
## 2 Americas 1019607512
## 3 Asia     4592311527
## 4 Europe    738083720
## 5 Oceania    40731992

合并来自不同数据集的信息

1 个答案: