使用子集识别和更正数据集中的拼写错误

时间:2017-11-06 02:55:15

标签: r subset data-cleaning levels

我有一个数据集:(可以从下面的链接看到) https://drive.google.com/file/d/0B4Mldbnr1-avMDIxYmZLSnRfUDA/view?usp=sharing我希望使用子集&水平功能。这是我一直试图应用但它似乎不起作用:

   at System.ModuleHandle.ResolveType(RuntimeModule module, Int32 typeToken, IntPtr* typeInstArgs, Int32 typeInstCount, IntPtr* methodInstArgs, Int32 methodInstCount, ObjectHandleOnStack type)
   at System.ModuleHandle.ResolveTypeHandleInternal(RuntimeModule module, Int32 typeToken, RuntimeTypeHandle[] typeInstantiationContext, RuntimeTypeHandle[] methodInstantiationContext)
   at System.Reflection.RuntimeModule.ResolveType(Int32 metadataToken, Type[] genericTypeArguments, Type[] genericMethodArguments)
   at System.Reflection.CustomAttribute.FilterCustomAttributeRecord(CustomAttributeRecord caRecord, MetadataImport scope, Assembly& lastAptcaOkAssembly, RuntimeModule decoratedModule, MetadataToken decoratedToken, RuntimeType attributeFilterType, Boolean mustBeInheritable, Object[] attributes, IList derivedAttributes, RuntimeType& attributeType, IRuntimeMethodInfo& ctor, Boolean& ctorHasParameters, Boolean& isVarArg)
   at System.Reflection.CustomAttribute.GetCustomAttributes(RuntimeModule decoratedModule, Int32 decoratedMetadataToken, Int32 pcaCount, RuntimeType attributeFilterType, Boolean mustBeInheritable, IList derivedAttributes, Boolean isDecoratedTargetSecurityTransparent)
   at System.Reflection.CustomAttribute.GetCustomAttributes(RuntimeMethodInfo method, RuntimeType caType, Boolean inherit)
   at System.Reflection.RuntimeMethodInfo.GetCustomAttributes(Type attributeType, Boolean inherit)
   at System.Attribute.GetCustomAttributes(MemberInfo element, Type type, Boolean inherit)
   at System.Attribute.GetCustomAttribute(MemberInfo element, Type attributeType, Boolean inherit)
   at System.Reflection.CustomAttributeExtensions.GetCustomAttribute[T](MemberInfo element)
   at System.Web.Http.Controllers.ApiControllerActionSelector.ActionSelectorCacheItem.IsValidActionMethod(MethodInfo methodInfo)
   at System.Array.FindAll[T](T[] array, Predicate`1 match)
   at System.Web.Http.Controllers.ApiControllerActionSelector.ActionSelectorCacheItem..ctor(HttpControllerDescriptor controllerDescriptor)
   at System.Web.Http.Controllers.ApiControllerActionSelector.GetInternalSelector(HttpControllerDescriptor controllerDescriptor)
   at System.Web.Http.Controllers.ApiControllerActionSelector.GetActionMapping(HttpControllerDescriptor controllerDescriptor)
   at System.Web.Http.Routing.AttributeRoutingMapper.AddRouteEntries(SubRouteCollection collector, HttpConfiguration configuration, IInlineConstraintResolver constraintResolver, IDirectRouteProvider directRouteProvider)
   at System.Web.Http.Routing.AttributeRoutingMapper.<>c__DisplayClass2.<>c__DisplayClass4.<MapAttributeRoutes>b__1()
   at System.Web.Http.Routing.RouteCollectionRoute.EnsureInitialized(Func`1 initializer)
   at System.Web.Http.Routing.AttributeRoutingMapper.<>c__DisplayClass2.<MapAttributeRoutes>b__0(HttpConfiguration config)
   at System.Web.Http.HttpConfiguration.EnsureInitialized()
   at System.Web.Http.GlobalConfiguration.Configure(Action`1 configurationCallback)
   at InflowHealthPortal.MvcApplication.Application_Start() in Global.asax.cs:line 22

我真的需要你的帮助。这应该很容易,因为我完全按照讲座中给出的内容,但在运行代码时它没有做任何事情。 感谢您的帮助 尼尔森

请求输出:

# Setting working directory
setwd("F:/Intro Data Science/Assignment Part B/Assignment Part B-20170902")
plot.new()
options(digits=2)

# Reading data set
installed.packages("lubridate")
library(lubridate)

# Reading data set
power <- read.csv("data set 6.csv", na.strings="")

# SUBSETTING
Area <- as.numeric(power$Area)
City <- as.character(power$City)
P.Winter <- as.numeric(power$P.Winter)
P.Summer <- as.numeric(power$P.Summer)

#Data Cleaning
levels(power$City)<- c(levels(power$City),"Auckland")
power$City[power$City == "Ackland"] <- "Auckland"

1 个答案:

答案 0 :(得分:1)

我相信你想要的功能是droplevels 首先,编制一些数据。

set.seed(5295)    # make the results reproducible
cities <- factor(sample(c("Ackland", "Auckland", "Wellington", "Sidney"), 100, TRUE))
power <- data.frame(City = cities)

现在代码,从你的开始。

power$City[power$City == "Ackland"] <- "Auckland"
power$City <- droplevels(power$City)

levels(power$City)    # check if it worked
#[1] "Auckland"   "Sidney"     "Wellington"

修改
在看到dput(head(power, 30))的输出后,解决方案变得明显了。列City属于character类,而不是factor,并且没有值"Ackland""Auckland",它们有一个尾随的空白区域起来。所以我们需要做的就是删除"Ackland "并删除尾随的空格。

str(power)
#'data.frame':   30 obs. of  4 variables:
# $ Area    : num  144 177 269 209 124 ...
# $ City    : chr  "Auckland " "Auckland " "Auckland " "Auckland " ...
# $ P.Winter: num  1685 1927 2027 1938 1580 ...
# $ P.Summer: num  1194 1487 1737 -158 1148 ...

which(power$City == "Ackland ")    # note the white space
#[1] 18

which(power$City == "Auckland ")    # note the white space
# [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 19 20 21 22 23 24 25 26
#[26] 27 28 29 30

# remove the value "Ackland ", with white space
power$City[power$City == "Ackland "] <- "Auckland"
power$City <- trimws(power$City)    # remove white spaces from all of them

没有列消失,只需运行str(power)即可查看。