我有来自https://drive.google.com/file/d/0B9YMMvghK2ytSXI4RFo0clNLc28/view
的数据基本上是~600,000行的钻石数据集
它缺少一个列价格的值。我想用特定颜色的中位数价格替换NA值
summary(BigDiamonds)
## X1 carat cut color
## Min. : 1 Min. :0.200 Length:598024 Length:598024
## 1st Qu.:149507 1st Qu.:0.500 Class :character Class :character
## Median :299013 Median :0.900 Mode :character Mode :character
## Mean :299013 Mean :1.071
## 3rd Qu.:448518 3rd Qu.:1.500
## Max. :598024 Max. :9.250
##
## clarity table depth cert
## Length:598024 Min. : 0.00 Min. : 0.00 Length:598024
## Class :character 1st Qu.:56.00 1st Qu.:61.00 Class :character
## Mode :character Median :58.00 Median :62.10 Mode :character
## Mean :57.63 Mean :61.06
## 3rd Qu.:59.00 3rd Qu.:62.70
## Max. :75.90 Max. :81.30
##
## measurements price x y
## Length:598024 Min. : 300 Min. : 0.150 Min. : 1.000
## Class :character 1st Qu.: 1220 1st Qu.: 4.740 1st Qu.: 4.970
## Mode :character Median : 3503 Median : 5.780 Median : 6.050
## Mean : 8753 Mean : 5.991 Mean : 6.199
## 3rd Qu.:11174 3rd Qu.: 6.970 3rd Qu.: 7.230
## Max. :99990 Max. :13.890 Max. :13.890
## NA's :713 NA's :1815 NA's :1852
## z
## Min. : 0.040
## 1st Qu.: 3.120
## Median : 3.860
## Mean : 4.033
## 3rd Qu.: 4.610
## Max. :13.180
## NA's :2544
和
table(BigDiamonds$color)
##
## D E F G H I J K L
## 73630 93483 93573 96204 86619 70282 48709 25868 9656
Diamonds2=BigDiamonds[is.na(BigDiamonds$price),]
Diamonds3=BigDiamonds[is.na(BigDiamonds$price)==F,]
library(Hmisc)
summarize(Diamonds3$price,Diamonds3$color,median)
## Diamonds3$color Diamonds3$price
## 1 D 2690
## 2 E 2342
## 3 F 2966
## 4 G 3720
## 5 H 4535
## 6 I 4717
## 7 J 4697
## 8 K 4418
## 9 L 3017
我尝试了这个,但它不起作用
Diamonds21=select(Diamonds2,price,color,cut)
Diamonds21$newprice=ifelse(Diamonds21$color=="J",4697,Diamonds21$newprice)
Diamonds21$newprice<-ifelse(Diamonds21$color=="D",2690,Diamonds21$newprice)
Diamonds21$newprice<-ifelse(Diamonds21$color=="E",2342,Diamonds21$newprice)
Diamonds21$newprice<-ifelse(Diamonds21$color=="F",2966,Diamonds21$newprice)
Diamonds21$newprice<-ifelse(Diamonds21$color=="G",3720,Diamonds21$newprice)
Diamonds21$newprice<-ifelse(Diamonds21$color=="H",4535,Diamonds21$newprice)
Diamonds21$newprice<-ifelse(Diamonds21$color=="I",4717,Diamonds21$newprice)
Diamonds21$newprice<-ifelse(Diamonds21$color=="K",4418,Diamonds21$newprice)
Diamonds21$newprice<-ifelse(Diamonds21$color=="L",3017,Diamonds21$newprice)
我的逻辑错了吗?
答案 0 :(得分:1)
根据最适合您需求的方式,您可以通过几种不同的方式来解决这个问题。
首先,让我们设置一个缺少stackalloc
值的钻石数据集:
val ppush = stackalloc[Ptr[czmq.zsock_t]]
val ppull = stackalloc[Ptr[czmq.zsock_t]]
!ppush = push
!ppull = pull
czmq.zsock_destroy(ppush)
czmq.zsock_destroy(ppull)
现在钻石数据集中有20%的price
值缺失。
您可以使用library(dplyr)
data(diamonds, package = "ggplot2")
diamonds_missing <- diamonds %>%
mutate(price = ifelse(sample(1:0,
size = length(diamonds$price),
replace = TRUE,
prob = c(0.8, 0.2)),
price, NA))
和price
将其替换为中位数:
mutate()
或者如果您愿意,可以使用tidyr包中的ifelse()
函数:
diamonds_missing %>%
mutate(price = ifelse(is.na(price), median(price, na.rm = TRUE), price))
#> # A tibble: 53,940 × 10
#> carat cut color clarity depth table price x y z
#> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
#> 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
#> 3 0.23 Good E VS1 56.9 65 2396 4.05 4.07 2.31
#> 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
#> 5 0.31 Good J SI2 63.3 58 2396 4.34 4.35 2.75
#> 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
#> 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
#> 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
#> 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
#> 10 0.23 Very Good H VS1 59.4 61 338 4.00 4.05 2.39
#> # ... with 53,930 more rows
答案 1 :(得分:0)
我试过这个并且有效
注意第一行是不同的
Diamonds21$newprice=ifelse(Diamonds21$color=="J",4697,Diamonds21$price)
Diamonds21$newprice<-ifelse(Diamonds21$color=="D",2690,Diamonds21$newprice)
Diamonds21$newprice<-ifelse(Diamonds21$color=="E",2342,Diamonds21$newprice)
Diamonds21$newprice<-ifelse(Diamonds21$color=="F",2966,Diamonds21$newprice)
Diamonds21$newprice<-ifelse(Diamonds21$color=="G",3720,Diamonds21$newprice)
Diamonds21$newprice<-ifelse(Diamonds21$color=="H",4535,Diamonds21$newprice)
Diamonds21$newprice<-ifelse(Diamonds21$color=="I",4717,Diamonds21$newprice)
Diamonds21$newprice<-ifelse(Diamonds21$color=="K",4418,Diamonds21$newprice)
Diamonds21$newprice<-ifelse(Diamonds21$color=="L",3017,Diamonds21$newprice)