因此,我一直在查看此代码,该代码最初是excel工作表。将数据集放入R Studio之后,我将遇到一些问题。
首先,一旦运行,我将所有空白单元格都更改为NA
CarparkData[is.na(CarparkData)] <- 0
它只会更改原来不为空白单元格的数据。
第二次删除重复数据,我使用了以下代码,但未发生任何事情。
library("dplyr")
install.packages("tidyverse")
library(tidyverse)
x <-CarparkData
duplicated(x)
x[duplicated(x),]
x[!duplicated(x),]
由于我有一行用于日期和时间,所以我想以此为列来删除重复数据的行。因为我有相同的数据,但是与相同的数据和相同的日期和时间相比,它们处于不同的时间。
第三次替换缺失的值 一些数据上面写有FULL,我想进入一列,然后将FULL更改为该特定停车场中已满的数字,因此更改该列中的FULL单元格,而不是全部FULL单元格。 / p>
样本数据
> dput(head(CarparkData))
structure(list(Parnell = c(188L, 183L, 185L, 229L, 237L, 272L
), Ilac = c(665, 683, 694, 769, 786, 839), Jervis = c(421, 408,
403, 417, 423, 455), Arnotts = c(340, 344, 350, 359, 359, 355
), Malboro = c(160L, 160L, 156L, 157L, 173L, 207L), Abbey = c(0,
0, 0, 0, 0, 0), `Thomas Street` = c(173, 173, 173, 186, 189,
198), `Christ Church` = c(77, 76, 74, 73, 83, 91), Setanta = structure(c(24L,
23L, 23L, NA, NA, 46L), .Label = c("10", "100", "101", "102",
"103", "104", "107", "108", "110", "111", "112", "113", "114",
"115", "120", "123", "125", "128", "129", "131", "14", "17",
"19", "21", "24", "27", "28", "29", "30", "31", "32", "34", "36",
"39", "40", "44", "45", "47", "48", "51", "52", "53", "56", "57",
"6", "60", "63", "66", "67", "7", "70", "72", "74", "78", "79",
"80", "81", "82", "84", "85", "86", "89", "9", "91", "92", "93",
"94", "96", "98", "FULL"), class = "factor"), Dawson = c(70,
87, 83, 118, 122, 140), Trinity = c(142L, 143L, 145L, 165L, 167L,
191L), Greenrcs = structure(c(NA, 8L, 9L, NA, 4L, 5L), .Label = c("1125",
"157", "205", "250", "262", "264", "266", "267", "270", "296",
"305", "311", "319", "320", "324", "327", "342", "347", "350",
"353", "364", "371", "374", "375", "378", "379", "459", "463",
"591", "729", "754", "761", "879", "902", "903", "907", "911",
"913", "916", "917", "922", "931", "944", "955", "974", "985",
"FULL"), class = "factor"), Drury = c(148, 143, 147, 182, 193,
235), `Brown Thomas` = c(230, 231, 0, 267, 272, 293), `Date & Time` = structure(1:6, .Label = c("2019-03-19 13:43:33",
"2019-03-19 13:55:39", "2019-03-19 14:07:35", "2019-03-19 15:45:02",
"2019-03-19 16:00:02", "2019-03-19 16:45:03", "2019-03-19 17:00:02",
"2019-03-19 17:45:03", "2019-03-19 18:00:01", "2019-03-19 18:00:02",
"2019-03-19 18:45:03", "2019-03-19 19:00:01", "2019-03-19 19:00:02",
"2019-03-19 19:07:12", "2019-03-19 19:45:03", "2019-03-19 20:00:01",
"2019-03-19 20:00:02", "2019-03-19 20:45:03", "2019-03-19 21:00:01",
"2019-03-19 21:00:03", "2019-03-19 21:45:04", "2019-03-19 22:00:01",
"2019-03-19 22:00:03", "2019-03-19 22:45:04", "2019-03-19 23:00:01",
"2019-03-19 23:00:02", "2019-03-19 23:00:03", "2019-03-19 23:45:04",
"2019-03-20 00:00:01", "2019-03-20 00:00:02", "2019-03-20 00:00:03",
"2019-03-20 00:45:04", "2019-03-20 01:00:01", "2019-03-20 01:00:02",
"2019-03-20 01:00:03", "2019-03-20 01:45:04", "2019-03-20 02:00:01",
"2019-03-20 02:00:02", "2019-03-20 02:00:03", "2019-03-20 02:45:04",
"2019-03-20 03:00:01", "2019-03-20 03:00:02", "2019-03-20 03:00:03",
"2019-03-20 03:45:05", "2019-03-20 04:00:01", "2019-03-20 04:00:02",
"2019-03-20 04:00:04", "2019-03-20 04:45:05", "2019-03-20 05:00:01",
"2019-03-20 05:00:02",
谢谢。
答案 0 :(得分:0)
第一个问题...如果要将所有空单元格显式设置为NA,则可以使用如下自定义函数:
empty_as_na <- function(x){
if("factor" %in% class(x)) x <- as.character(x) ## since ifelse wont work with factors
ifelse(as.character(x)!="", x, NA)
}
然后应用此功能:
dplyr::mutate_all(df, .funs = empty_as_na)
其中df
是您的数据框。
第二个问题...要删除重复的行,您应该查看dplyr::distinct()
第三个问题...我没有得到什么问题...也许您可以澄清?
很抱歉,我无法使用您提供的数据为您提供完整的工作示例...但是这些功能应该可以使您到达所需的位置。
编辑
基于评论的第三期解决方案...
可能不是最优雅的解决方案,但同样,由于未提供reprex,因此受到限制。
让df
为数据框,column_new
为新列,column_number
提到的列有数字或FULL的列,column_car
为汽车所在的列是。
df %>%
mutate(
column_new = case_when(
column_number == "FULL" & column_car == "car_a" ~ 300,
column_number == "FULL" & column_car == "car_b" ~ 500,
TRUE ~ column_number
)
)