将多列数值数据转换为R中的日期

时间:2018-07-31 17:01:53

标签: r class date multiple-columns

我的数据框有4个日期列,但是当我将文件从Excel加载到R时,所有日期列都变为数字。我不想写单独的行来将R中的每一列都转换为日期,因为这是我在该数据集中经常要做的事情,所以我写了这个循环来更改日期类型。它没有按我希望的那样工作,因为这些年已经变成了2000年,而不是原来的1900年。我已经放入示例数据集和以下代码:

xl <- structure(list(DOB = c(33483, 19213, 18947, 25266, 14581, 22870, 23705, 19592, 15033, 17856, 15551, 33681, 23483, 34619, 29125,31824, 18560, 35009, 16994, 22052, 17111, 28724, NA, 24852, 10980, 34222, 32220,18262, 16141, 28075, 11058, 23102, 26111, 30951, 14429, 25017,28281, 13239, 33977, 17309, 28103, 12115, 21331, 13217, 22898, 31491, 19787, 20160, 12364, 10609, 33846, 22699, 30428, 19421, 33339, 31575, 35187, 25053, 25500, 9291, 19100, 33025, 20040, 22909, 28189, 31909, 34476, 29007, 25575, 24127, 17493, 19572, 29032, 35241, 16353, 17038, 17623, 28056, 16408, 27879, 31161, 25669, 35614, 30573, 21878, 35815, 28826, 24351, 19828, 27159, 22897, 25779, 30880, 30344, 18643, 23748, 24340, 23784, 31276, 25795, 16908, 34277, 22550, 18824, 13795, 34548, 34940, 17395, 22603, 28913, 19478, 16117, 29331, 29557, 16459,32665, 35092, 33810, 13710, 34611, 26339, 33712, 35505, 17427, 29238, 30557, 21994, 23020, 20084, 23647, 21838, 9421, 33657, 14433, 22284, 33857, 31064, 35270, 33380, 21866, 15317, 35466, 29503, 33401, 27059, 19315, 23095, 28487, 35434, 15403, 21563, 22801, 27079, 24511, 18215, 16171, 16601, 29396, 24118, 21030, 24544, 12856, 35721, 11105, 23213, 35322, 15290, 20132, 23691, 30587, 27723, 30233, 28173, 30811, 33259, 12814, 36117, 14638, 34681, 13191, 23205, 14160, 20210, 35569, 31310, 16329, 26409, 20704, 32217, 28347, 21187, 15977, 31470, 28644, 15303, 31341, 18369, 16545, 24221, 19052, 34062, 28375, 33067, 17319, 32124, 15140, 24736, 23447, 12800, 27580, 18167, 34765, 31025, 21441, 16035, 21086, 21330,26485, 16274, 14136, 28513, 28381, 19584, 8446, 20227, 19866, 17269, 22108, 28557, 13340, 13953, 18622), 
D1 = c(40886, 40890, 40944, 40947, 40941, 
40948, 40948, 41199, 40967, 41053, 40974, 40981, 41114, 41094, 
41116, 41123, 41135, 41150, 41194, 41226, 41317, 41212, 41213, 
41297, 41267, 41267, 41295, 41506, 41310, 41310, 41316, 41318, 
41319, 41323, 41502, 41326, 41331, 41339, 41381, 41360, 41372, 
41373, 41382, 41407, 41444, 41450, 41457, 41458, 41459, 41486, 
41488, 41488, 41488, 41488, 41500, 41535, 41533, 41543, 41554, 
41561, 41565, 41582, 41592, 41606, 41624, 41624, 41682, 41682, 
41683, 41690, 41696, 41704, 41711, 41715, 41715, 41701, 41732, 
41739, 41760, 41774, 41792, 41795, 41813, 41815, 41816, 41816, 
41821, 41823, 41824, 41841, 41844, 41850, 41849, 41850, 41852, 
41856, 41858, 41862, 41873, 41873, 41877, 41878, 41879, 41880, 
41880, 41887, 41887, 41887, 41891, 41891, 41893, 41899, 41901, 
41905, 41906, 41907, 41907, 41911, 41887, 41921, 41925, 41928, 
41928, 41934, 41939, 41942, 41943, 41947, 41947, 41953, 41954, 
41955, 41968, 41977, 41978, 41981, 41984, 41991, 41992, 42020, 
42023, 42031, 42032, 42040, 42041, 42047, 42047, 42054, 42065, 
42059, 42061, 42069, 42073, 42075, 42079, 42102, 42123, 42131, 
42135, 42121, 42135, 42138, 42142, 42142, 42146, 42146, 42160, 
42165, 42173, 42174, 42174, 42187, 42195, 42202, 42201, 42142, 
42152, 42255, 42264, 42284, 42291, 42298, 42298, 42298, 42312, 
42174, 41505, 41519, 41638, 41723, 41848, 41862, 41862, 41885, 
41925, 41953, 42107, 42207, 40987, 41331, 41505, 41723, 41892, 
41926, 41960, 41985, 42144, 42188, 40961, 41058, 41108, 41200, 
41254, 41309, 41291, 41331, 41366, 41389, 41401, 41444, 41493, 
41610, 41694, 41718, 41806, 41873, 41956, 42019, 42037, 42164, 
42200, 41562), D2 = c(40695, 31205, 34135, 
40391, 39995, 40725, 40483, 41183, 40817, 39814, 33239, 40909, 
40725, 41030, 40756, 40969, 39326, 39814, 41061, 41061, 40909, 
40483, 36161, 37622, 40544, 40909, 40817, 39448, 40179, 32509, 
40238, 40575, 41030, 38353, 40969, 40787, 41061, 41030, 41214, 
40695, 41000, 41183, 39083, 39934, 40603, 39904, 40940, 41426, 
41214, 40725, 41426, 40695, 39814, 40179, 41183, 41275, 41218, 
41214, 40940, 41426, 40544, 40909, 38047, 41579, 34700, 35746, 
41000, 36161, 41426, 41183, NA, 38718, 41548, 41456, 38536, 
39387, 41548, 41518, 40360, 41699, 41778, 41655, 41030, 41730, 
40909, 40544, 41671, 41214, 41699, 39083, 41214, 41640, 41671, 
36161, 41426, 41821, 39083, 41275, 41000, 41760, 41579, 36526, 
41548, 37987, 40179, 40179, 40787, 41609, 41730, 40544, 38504, 
41334, 41334, 41609, 41275, 41699, 40817, 41214, 41334, 41518, 
35065, 35796, 41170, 41699, 41695, 41365, 41852, 37257, 41579, 
33604, 40909, 41913, 41852, 41564, 41852, 41883, 39448, 39083, 
40544, 41944, 41275, 41852, 41640, 42005, 41548, 39995, 30682, 
41883, 41546, 41640, 41791, 41334, 41944, 40179, 41995, 40179, 
23012, 39814, 41956, 39083, 41609, 39448, 41974, 41275, 40544, 
42125, 41928, 39814, 41944, 41962, 40909, 42095, 41852, 41913, 
41944, 40848, 42096, 40544, 40179, 41913, 40179, 42064, 41395, 
37622, 42156, 31048, 41314, 41377, 41452, 41623, 41813, 41760, 
41705, 41867, 41699, 41942, 41944, 42197, 29221, 40179, 41000, 
41153, 40544, 39448, 41548, 41760, 40179, 41821, 40909, 38353, 
39448, 41000, 40940, 41000, 40909, 41000, 40664, 41091, 41030, 
41395, 41306, 41061, 41518, 41334, 41609, 41852, 41760, 41821, 
41944, 42095, 42095, 41476), D3 = c(40817, 
40817, 40913, 40940, 40940, 40756, 40634, 41183, 40940, 41030, 
40817, 40969, 41091, 41091, 40787, 41122, 39448, 41030, 41183, 
41091, 41091, 40848, 36526, 41365, 41153, 40909, 41030, 41244, 
41122, 35065, 40544, 41122, 41061, 40179, 41183, 41306, 41214, 
41306, 41365, 41334, 41030, 41244, 41091, 41395, 41275, 41426, 
40940, 41456, 41365, 41456, 41456, 41426, 41456, 41395, 41487, 
41275, 41414, 41518, 41275, 41456, 41579, 41153, 41579, 41640, 
41334, 41395, 41487, 41579, 41426, 41671, 41671, 41699, 41699, 
41456, 41456, 39508, 41730, 41609, 41760, 41760, 41791, 41671, 
41365, 41791, 41821, 41699, 41821, 41548, 41821, 41821, 41579, 
41699, 41821, 41821, 41821, 41852, 41640, 41852, 41852, 41791, 
41852, 41852, 41852, 41760, 41852, 41640, 41518, 41852, 41883, 
41852, 41487, 41699, 41883, 41640, 41883, 41730, 41883, 41791, 
41883, 41671, 41699, 41671, 41671, 41883, 41863, 41913, 41852, 
41699, 41944, 41730, 41760, 41944, 41883, 41760, NA, 41974, 
41974, 41974, 41609, 41944, 42005, 41913, 41913, 42005, 42036, 
41913, 42036, 42036, 42064, 41944, 41944, 42064, 42675, 42064, 
42064, 42095, 42095, 42064, 41956, 42121, 41974, 42125, 42005, 
42125, 40544, 42125, 41974, 42156, 41944, 42005, 42005, 42095, 
41852, 42186, 42186, 42036, 42095, 42125, 42186, 42064, 42309, 
42217, 42278, 42278, 42309, 41609, 41487, 41365, 41609, 41699, 
42186, 41852, 41852, 41974, 41913, 41944, 42095, 42186, NA, 
41183, 41183, 41518, 41579, 41791, 41579, 41883, 42186, 42186, 
40940, 40787, 40725, 41030, 40940, 41153, 41153, 41306, 40817, 
41214, 41395, 41426, 41306, 41609, 41671, 41579, 41791, 41852, 
41852, 41821, 41974, 42156, 42156, 42522)), .Names = c("D", "D1", "D2", "D3"), class = "data.frame", row.names = c(NA, 232L))

date_cols <- c(1,2,3,4)
for(j in date_cols)
{class(xl[,j] = "Date"}

3 个答案:

答案 0 :(得分:5)

您可能需要一个函数来告诉R这些整数代表日期,然后您需要将该函数应用于数据框的每一列:

myfun <- function(x) as.Date(x, format="%Y-%m-%d", origin="1899-12-30")
xlnew <- data.frame(lapply(xl, myfun))

您可以通过匿名传递函数或像回答其中之一来避免所有这些情况。另外,在我的环境中也设置了options(stringsAsFactors = FALSE)来进行不必要的因素更改。

逻辑:

Excel日期从1900-01-01开始,索引为1,但是R日期通常我们认为起源于1970-01-01。相差70年,加上索引的1天差异,因为R的首个日期从索引0而不是1开始。此外,由于Excel的历史原因,Excel中存在一个错误,即Excel认为29-Feb-1900是有效的日期,这是不正确的。因此,我们应该从实际来源(Excel的1900-01-01)中减去2天(由于建立索引导致的1天差异,另外1天归因于Excel中的错误),以获取正确的日期。

前5行的输出:

> xlnew
             D         D1         D2         D3
1   1991-09-02 2011-12-09 2011-06-01 2011-10-01
2   1952-08-07 2011-12-13 1985-06-07 2011-10-01
3   1951-11-15 2012-02-05 1993-06-15 2012-01-05
4   1969-03-04 2012-02-08 2010-08-01 2012-02-01
5   1939-12-02 2012-02-02 2009-07-01 2012-02-01

答案 1 :(得分:4)

@PKumar显示了如何使用as.Date函数,但是创建了一个新的数据框。

要替换原始数据框中的列的子集,您可以执行以下操作:

xl[date_cols] <- lapply(xl[date_cols], as.Date, origin="1899-12-30")

答案 2 :(得分:1)

我个人的喜好是使用data.table程序包-它的速度很快,语法简约,并且循环很容易实现以引用方式进行修改。对于大型数据集,这将非常有效。我会这样:

选项1 -在lapply函数中使用列名。

library(data.table)

setDT(xl) # Convert the data.frame to data.table, by reference

xl[ , c("D", "D1", "D2", "D3") := lapply(.SD, as.Date, origin="1899-12-30"), .SDcols = c("D", "D1", "D2", "D3")]

选项2 -在向量中定义列名称,并在lapply函数中使用它

library(data.table)

setDT(xl) # Convert the data.frame to data.table, by reference

my.cols <- c("D", "D1", "D2", "D3")

xl[ , (my.cols) := lapply(.SD, as.Date, origin="1899-12-30"), .SDcols = my.cols]

请注意,这两个选项都会更改现有数据,因此您无需将其分配给新对象。

我希望这会有所帮助。