我正在通过openxls导入数据集,并运行na.omit删除NA。当我运行它时,“是/否”字段将其所有“是”更改为“否”,使整个列具有相同的值。
您会在代码中看到我正在对数据做一些其他事情以对其进行一些清理,但是我无法确定这是否对我遇到的na.omit问题有影响。
library(openxlsx)
library(sqldf)
# Import data survey data from file
projsurvey <- read.xlsx("SatisfactionSurvey2_2.xlsx", sheet = 1)
# Make field names SQL friendly by removing spaces and special characters/punctuation
fields <- colnames(projsurvey)
fields <- gsub(" ","",fields)
fields <- gsub("\\.","",fields)
fields[8] <- "PercentofFlightWithOtherAirlines"
colnames(projsurvey) <- fields
# clean up Satisfaction data and convert field to numeric
# sqldf("select distinct Satisfaction from projsurvey") # to identify bad data
projsurvey <- projsurvey[projsurvey$Satisfaction!="4.00.2.00",]
projsurvey <- projsurvey[projsurvey$Satisfaction!="4.00.5",]
projsurvey$Satisfaction <- as.numeric(as.character(projsurvey$Satisfaction)) # change field type to numeric
# sqldf("select distinct Satisfaction from projsurvey")
# Change 'yes' or 'no' fields to 1 and 0 indicator
# projsurvey[,"Flightcancelled"] <- sqldf("select case when lower(Flightcancelled) = 'yes' then 1 when lower(Flightcancelled) = 'no' then 0 end from projsurvey")
# projsurvey[,"ArrivalDelaygreater5Mins"] <- sqldf("select case when lower(ArrivalDelaygreater5Mins) = 'yes' then 1 when lower(ArrivalDelaygreater5Mins) = 'no' then 0 end from projsurvey")
# Change TypeofTravel field to indicators
# projsurvey[,"TypeofTravel"] <- sqldf("select case when lower(TypeofTravel) = 'mileage tickets' then 1 when lower(TypeofTravel) = 'business travel' then 2 when lower(TypeofTravel) = 'personal travel' then 3 end from projsurvey")
# remove NAs from data
length(na.omit(projsurvey)) # just to see how many records would be removed
# only 28 records will be removed, which we'll consider neglible for the remainder of the analysis
# THIS IS WHERE I'M HAVING THE ISSUE
sqldf("select distinct Flightcancelled from projsurvey")
na.omit(projsurvey)
projsurvey <- na.omit(projsurvey)
sqldf("select distinct Flightcancelled from projsurvey")
str(projsurvey)
Satisfaction Airline Status Age Gender Price Sensitivity Year of First Flight No of Flights p.a. % of Flight with other Airlines Type of Travel No. of other Loyalty Cards Shopping Amount at Airport Eating and Drinking at Airport Class Day of Month Flight date Airline Code Airline Name Orgin City Origin State Destination City Destination State Scheduled Departure Hour Departure Delay in Minutes Arrival Delay in Minutes Flight cancelled Flight time in minutes Flight Distance Arrival Delay greater 5 Mins
3 Silver 46 Female 2 2004 24 6 Personal Travel 0 0 35 Eco 11 2/11/2014 MQ EnjoyFlying Air Services Washington, DC Virginia Nashville, TN Tennessee 7 0 6 No 103 562 yes
3 Silver 43 Female 2 2011 7 14 Personal Travel 1 120 110 Eco 20 1/20/2014 MQ EnjoyFlying Air Services Washington, DC Virginia Nashville, TN Tennessee 20 3 6 No 97 562 yes
2 Blue 66 Male 1 2005 25 6 Personal Travel 0 15 165 Eco 11 1/11/2014 MQ EnjoyFlying Air Services Washington, DC Virginia Nashville, TN Tennessee 11 30 41 No 105 562 yes
3 Blue 75 Female 2 2003 34 10 Personal Travel 0 0 95 Eco 2 2/2/2014 MQ EnjoyFlying Air Services Washington, DC Virginia Nashville, TN Tennessee 13 0 2 No 103 562 no
3 Silver 71 Male 2 2007 17 6 Personal Travel 0 40 40 Eco 10 2/10/2014 MQ EnjoyFlying Air Services Washington, DC Virginia Nashville, TN Tennessee 7 103 117 No 105 562 yes
2 Blue 80 Male 1 2004 53 6 Personal Travel 0 0 30 Eco 18 2/18/2014 MQ EnjoyFlying Air Services Washington, DC Virginia Nashville, TN Tennessee 20 88 84 No 90 562 yes
2 Blue 67 Female 1 2007 42 1 Personal Travel 0 0 115 Eco 6 3/6/2014 MQ EnjoyFlying Air Services Washington, DC Virginia Nashville, TN Tennessee 20 42 37 No 87 562 yes
2 Blue 16 Female 1 2007 8 4 Personal Travel 3 0 30 Eco 16 2/16/2014 MQ EnjoyFlying Air Services Washington, DC Virginia Nashville, TN Tennessee 20 7 15 No 94 562 yes
3 Blue 31 Female 1 2005 43 5 Personal Travel 3 15 25 Eco 23 1/23/2014 MQ EnjoyFlying Air Services Washington, DC Virginia Nashville, TN Tennessee 13 Yes 562 no
4 Silver 58 Female 1 2004 4 3 Business travel 0 0 0 Eco Plus 31 3/31/2014 MQ EnjoyFlying Air Services Washington, DC Virginia Nashville, TN Tennessee 20 54 49 No 84 562 yes
3 Blue 36 Female 2 2003 29 39 Business travel 5 0 90 Eco Plus 31 1/31/2014 MQ EnjoyFlying Air Services Washington, DC Virginia Nashville, TN Tennessee 16 115 116 No 97 562 yes
当我运行na.omit时,Flightcancelled字段的所有“是”值都更改为“否”。谁能解释为什么?