我在使用dplyr库时遇到了困难。我一直在尝试实现一个相对简单的代码,但出于某种原因,当我将一个变量分组并尝试求和以获得该变量的总和时,我只得到NA值。这是我的文件:
https://www.dropbox.com/sh/zhxfj6cm6gru0t1/AAA-DgeTrngJ0md12W2bEzi0a
这是代码:
library (dplyr)
#we set the working directory
setwd("~/asado/R/emp")
##we list the files
list.files()
##we load the csv files
emp1 <- read.csv("AI_EMP_CT_A.csv", sep=',')
##emp1 contains employment information for US counties with naics classification
##empva is another part of the same dataset
empva <- read.csv("AI_EMP_CT_VA_A.csv", sep=',')
##we merge our files, they have the same dimentions so rbind works
emp <- data.frame(rbind(emp1, empva))
##we create a variable to summarize our data
##and make sure is stored as character
emp$naics <- as.character(substring(emp$Mnemonic,3,6))
##we try to summarize by the variable naics, summing for Dec.2013
useemp<- emp%.% group_by(naics) %.%
summarize(total=sum(Dec.2013, na.rm=T))
##the resulting dataframe shows NA
head(useemp)
知道发生了什么事吗?
答案 0 :(得分:2)
这对我有用,但读取你的empva文件很复杂,因为最后一栏,
2013年12月填满;
并且没有与之分开。你确定它被读作数字吗?
useemp <- emp %>% group_by(naics) %>%
summarize(total=sum(Dec.2013, na.rm=T))
head(useemp)
Source: local data frame [6 x 2]
naics total
1 2111 132.04674
2 2121 24.84666
3 2122 23.90470
4 2123 17.57697
5 2131 77.20557
6 2211 119.30697