合并来自不同数据集的信息

时间:2020-05-04 15:20:56

标签: r

国家和大洲在此数据集中。

df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv')

#This data set contains countries and population information.

df8 <- read.csv ('https://raw.githubusercontent.com/hirenvadher954/Worldometers-Scraping/master/countries.csv')


library(dplyr)
library(stringr

df %>% 
    left_join(df8, by = c("countryName" = "country_name")) %>% 
    mutate(population = as.numeric(str_remove_all(population, ","))) %>% 
    group_by(countryName) %>%
    slice_tail(1) %>%
    group_by(region) %>% 
    summarize(population = sum(population, na.rm = TRUE)) 

df%>%left_join(df8,由= c(countryName =“ country_name”))%>%错误: 未找到函数“%>%” 出现此错误。您能解释一下原因并提供解决方案吗?

如何将数据集1中的大陆信息与数据集2中的人口信息结合起来?

例如:亚洲28亿,非洲8亿,欧洲10亿

1 个答案:

答案 0 :(得分:1)

您在这里遇到了几个问题:

1)使用read.csv读取数据时,国家/地区被视为因素;您可以使用参数stringsAsFactors = FALSE

来解决此问题

2)slice_tail不知道这是哪里来的;您正在寻找dplyr::slice吗?


df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv',
                stringsAsFactors = FALSE)

#This data set contains countries and population information.

df8 <- read.csv ('https://raw.githubusercontent.com/hirenvadher954/Worldometers-Scraping/master/countries.csv',
                 stringsAsFactors = FALSE)


library(dplyr) 
library(stringr

        df %>% 
          left_join(df8, by = c("countryName" = "country_name")) %>% 
          mutate(population = as.numeric(str_remove_all(population, ","))) %>% 
          group_by(countryName) %>%
          slice(1) %>%
          group_by(region) %>% 
          summarize(population = sum(population, na.rm = TRUE)) 

这给您:

df
## # A tibble: 5 x 2
##   region   population
##   <chr>         <dbl>
## 1 Africa   1304908713
## 2 Americas 1019607512
## 3 Asia     4592311527
## 4 Europe    738083720
## 5 Oceania    40731992