Tidyr在R中的“巢”函数无法识别变量并打印:“警告消息:未知或未初始化的列”

时间:2018-07-23 16:35:37

标签: r nested initialization tidyr tibble

我正在处理一个数据集,该数据集的列中有一个名为“ ccode”的国家/地区代码:

votes tibble

当我创建另一列来创建名称为“ country”的国家名称时,我使用了从CRAN下载的countrycode包中的函数“ countrycode”,并得到了以下结果:

votes_processed <- votes %>%
  filter(vote <= 3) %>%
  mutate(year = session + 1945,
         country = countrycode(ccode,"cown","country.name"))

和以下警告消息:

Warning message:
In countrycode(ccode, "cown", "country.name") :
  Some values were not matched unambiguously: 260, 816

country votes tibble

由于无法为这些国家/地区代码分配国家/地区名称,因此我将其从数据框中过滤掉:

> table(is.na(votes_processed$country))

 FALSE   TRUE 
350844   2703 
> votes_processed <- filter(votes_processed,!is.na(country))
> table(is.na(votes_processed$country))

 FALSE 
350844 

此后,我运行以下命令来创建另一个小标题,该小标题为我提供有关总票数以及按年份和国家/地区表示的“是”(1-是)投票比例的分组信息:

# Group by year and country: by_year_country
by_year_country <- votes_processed %>%
  group_by(year,country) %>%
  summarize(total = n(),
            percent_yes = mean(vote == 1))

by_year_country tibble

然后我运行以下命令以按国家/地区嵌套数据,控制台将发送以下警告并删除我的国家/地区列:

> nested <- by_year_country %>%
+   nest(-country)
Warning message:
Unknown or uninitialised column: 'country'. 

nested tibble

> nested$country
NULL
Warning messages:
1: Unknown or uninitialised column: 'country'. 
2: Unknown or uninitialised column: 'country'. 

有人可以解释一下“国家”一栏的情况,为什么R无法识别它,我该怎么办?

我是这个平台的初学者。我收到一条评论,要求提供数据样本,并将其粘贴在这里:

rcid<-c(5168,4317,3598,2314,1220,5024,3151,2042,2513,238,4171,3748,2595,
        5160,4476,308,3621,874,2025,3793,3595,1191,987,1207,2255,211,
        2585,2319,3590,189)
session<- c(66,56,46,36,26,64,42,34,38,4,54,48,38,66,58,6,46,18,34,
            48,46,26,22,26,36,4,38,36,46,4)
vote<- c(1,8,1,8,9,1,3,2,2,9,2,1,3,1,1,1,1,1,1,1,1,1,9,2,1,9,1,1,1,2)
ccode<-as.integer(c(816,816,816,816,816,816,260,260,260,260,2,42,2,20,
                    31,41,20,42,41,31,70,95,80,93,58,51,53,90,55,90))

sample_data_votes<-data.frame("rcid"=rcid,"session"=session, "vote"= vote,
                              "ccode"=ccode)

非常感谢您的时间和建议。

2 个答案:

答案 0 :(得分:2)

by_year_country已分组,因此您需要先取消分组再进行嵌套

library(tidyverse)
by_year_country %>% ungroup() %>% 
                     nest(-country) %>% head(n=2)

# A tibble: 2 x 2
  country   data            
 <chr>     <list>          
1 Guatemala <tibble [2 x 3]>
2 Haiti     <tibble [2 x 3]>

答案 1 :(得分:1)

您似乎需要从对-country的呼叫中删除nest部分

library(dplyr)
library(tidyr)
library(countrycode)
rcid<-c(5168,4317,3598,2314,1220,5024,3151,2042,2513,238,4171,3748,2595,
        5160,4476,308,3621,874,2025,3793,3595,1191,987,1207,2255,211,
        2585,2319,3590,189)
session<- c(66,56,46,36,26,64,42,34,38,4,54,48,38,66,58,6,46,18,34,
            48,46,26,22,26,36,4,38,36,46,4)
vote<- c(1,8,1,8,9,1,3,2,2,9,2,1,3,1,1,1,1,1,1,1,1,1,9,2,1,9,1,1,1,2)
ccode<-as.integer(c(816,816,816,816,816,816,260,260,260,260,2,42,2,20,
                    31,41,20,42,41,31,70,95,80,93,58,51,53,90,55,90))

votes<-data.frame("rcid"=rcid,"session"=session, "vote"= vote,
                              "ccode"=ccode)
votes_processed <- votes %>%
  filter(vote <= 3) %>%
  mutate(year = session + 1945,
         country = countrycode(ccode,"cown","country.name")) %>% 
  filter(!is.na(country))

by_year_country <- votes_processed %>%
  group_by(year,country) %>%
  summarize(total = n(),
            percent_yes = mean(vote == 1))

nested <- by_year_country %>%
  nest()

让-country告诉nest使用国家以外的所有东西。默认情况下,嵌套使用除分组列之外的所有列。 by_year_country是按年份分组的小标题。摘要调用删除了一个分组级别,因此不再按国家/地区分组,而是按年份分组。如果要删除分组,请使用ungroup()