na.omit更改其他列

时间:2019-06-15 21:04:39

标签: r

我正在通过openxls导入数据集,并运行na.omit删除NA。当我运行它时,“是/否”字段将其所有“是”更改为“否”,使整个列具有相同的值。

您会在代码中看到我正在对数据做一些其他事情以对其进行一些清理,但是我无法确定这是否对我遇到的na.omit问题有影响。

library(openxlsx)
library(sqldf)

# Import data survey data from file
projsurvey <- read.xlsx("SatisfactionSurvey2_2.xlsx", sheet = 1)

# Make field names SQL friendly by removing spaces and special characters/punctuation
fields <- colnames(projsurvey)
fields <- gsub(" ","",fields)
fields <- gsub("\\.","",fields)
fields[8] <- "PercentofFlightWithOtherAirlines"
colnames(projsurvey) <- fields
# clean up Satisfaction data and convert field to numeric
# sqldf("select distinct Satisfaction from projsurvey") # to identify bad data
projsurvey <- projsurvey[projsurvey$Satisfaction!="4.00.2.00",]
projsurvey <- projsurvey[projsurvey$Satisfaction!="4.00.5",]
projsurvey$Satisfaction <- as.numeric(as.character(projsurvey$Satisfaction)) # change field type to numeric
# sqldf("select distinct Satisfaction from projsurvey")

# Change 'yes' or 'no' fields to 1 and 0 indicator
# projsurvey[,"Flightcancelled"] <- sqldf("select case when lower(Flightcancelled) = 'yes' then 1 when lower(Flightcancelled) = 'no' then 0 end from projsurvey")
# projsurvey[,"ArrivalDelaygreater5Mins"] <- sqldf("select case when lower(ArrivalDelaygreater5Mins) = 'yes' then 1 when lower(ArrivalDelaygreater5Mins) = 'no' then 0 end from projsurvey")

# Change TypeofTravel field to indicators
# projsurvey[,"TypeofTravel"] <- sqldf("select case when lower(TypeofTravel) = 'mileage tickets' then 1 when lower(TypeofTravel) = 'business travel' then 2 when lower(TypeofTravel) = 'personal travel' then 3 end from projsurvey")

# remove NAs from data
length(na.omit(projsurvey)) # just to see how many records would be removed
# only 28 records will be removed, which we'll consider neglible for the remainder of the analysis

# THIS IS WHERE I'M HAVING THE ISSUE
sqldf("select distinct Flightcancelled from projsurvey")
na.omit(projsurvey)
projsurvey <- na.omit(projsurvey)
sqldf("select distinct Flightcancelled from projsurvey")
str(projsurvey)
Satisfaction    Airline Status  Age Gender  Price Sensitivity   Year of First Flight    No of Flights p.a.  % of Flight with other Airlines Type of Travel  No. of other Loyalty Cards  Shopping Amount at Airport  Eating and Drinking at Airport  Class   Day of Month    Flight date Airline Code    Airline Name    Orgin City  Origin State    Destination City    Destination State   Scheduled Departure Hour    Departure Delay in Minutes  Arrival Delay in Minutes    Flight cancelled    Flight time in minutes  Flight Distance Arrival Delay greater 5 Mins
3   Silver  46  Female  2   2004    24  6   Personal Travel 0   0   35  Eco 11  2/11/2014   MQ  EnjoyFlying Air Services    Washington, DC  Virginia    Nashville, TN   Tennessee   7   0   6   No  103 562 yes
3   Silver  43  Female  2   2011    7   14  Personal Travel 1   120 110 Eco 20  1/20/2014   MQ  EnjoyFlying Air Services    Washington, DC  Virginia    Nashville, TN   Tennessee   20  3   6   No  97  562 yes
2   Blue    66  Male    1   2005    25  6   Personal Travel 0   15  165 Eco 11  1/11/2014   MQ  EnjoyFlying Air Services    Washington, DC  Virginia    Nashville, TN   Tennessee   11  30  41  No  105 562 yes
3   Blue    75  Female  2   2003    34  10  Personal Travel 0   0   95  Eco 2   2/2/2014    MQ  EnjoyFlying Air Services    Washington, DC  Virginia    Nashville, TN   Tennessee   13  0   2   No  103 562 no
3   Silver  71  Male    2   2007    17  6   Personal Travel 0   40  40  Eco 10  2/10/2014   MQ  EnjoyFlying Air Services    Washington, DC  Virginia    Nashville, TN   Tennessee   7   103 117 No  105 562 yes
2   Blue    80  Male    1   2004    53  6   Personal Travel 0   0   30  Eco 18  2/18/2014   MQ  EnjoyFlying Air Services    Washington, DC  Virginia    Nashville, TN   Tennessee   20  88  84  No  90  562 yes
2   Blue    67  Female  1   2007    42  1   Personal Travel 0   0   115 Eco 6   3/6/2014    MQ  EnjoyFlying Air Services    Washington, DC  Virginia    Nashville, TN   Tennessee   20  42  37  No  87  562 yes
2   Blue    16  Female  1   2007    8   4   Personal Travel 3   0   30  Eco 16  2/16/2014   MQ  EnjoyFlying Air Services    Washington, DC  Virginia    Nashville, TN   Tennessee   20  7   15  No  94  562 yes
3   Blue    31  Female  1   2005    43  5   Personal Travel 3   15  25  Eco 23  1/23/2014   MQ  EnjoyFlying Air Services    Washington, DC  Virginia    Nashville, TN   Tennessee   13          Yes     562 no
4   Silver  58  Female  1   2004    4   3   Business travel 0   0   0   Eco Plus    31  3/31/2014   MQ  EnjoyFlying Air Services    Washington, DC  Virginia    Nashville, TN   Tennessee   20  54  49  No  84  562 yes
3   Blue    36  Female  2   2003    29  39  Business travel 5   0   90  Eco Plus    31  1/31/2014   MQ  EnjoyFlying Air Services    Washington, DC  Virginia    Nashville, TN   Tennessee   16  115 116 No  97  562 yes

当我运行na.omit时,Flightcancelled字段的所有“是”值都更改为“否”。谁能解释为什么?

0 个答案:

没有答案