Question

有一个庞大的数据框，列中有很多日期。像这样：

Date
2014-01-02
2014-01-02
2014-01-02
2014-01-03
2014-01-03
2014-02-01
2014-02-01
2014-02-02
2014-02-02

我想构建一个额外的列，其中包含当月出现的天数（！数据包含多年数据，因此有超过1个Januaries，Februaries等）。就像这样：

Date           Count
2014-01-02      5
2014-01-02      5
2014-01-02      5
2014-01-03      5
2014-01-03      5
2014-02-01      4
2014-02-01      4
2014-02-02      4
2014-02-02      4

我的解决方案很差。我使用过滤器选项（来自dplyr）来过滤特定月份，然后对它们进行计数。但是因为它耗费了大量时间，而且因为我想自动完成这项工作，所以我正在寻找一种更可持续的解决方案。

Answer 1

如果您的日期为POSIXlt格式，则内置月份，您只需制作一个表格即可参考。

Date = as.POSIXlt(c('2014-01-02',
'2014-01-02',
'2014-01-02',
'2014-01-03',
'2014-01-03',
'2014-02-01',
'2014-02-01',
'2014-02-02',
'2014-02-02'))

table(Date$mon)[as.character(Date$mon)]
0 0 0 0 0 1 1 1 1 
5 5 5 5 5 4 4 4 4

0/1行只是列名。在POSIX中，1月是0月，2月是1月等，

Answer 2

由于您已经在使用zip -j bundle.zip ./out/index.js ...

dplyr

Answer 3

你可以使用基地R：

d <- read.table(header=TRUE, stringsAsFactors = FALSE, text=
"Date
2014-01-02
2014-01-02
2014-01-02
2014-01-03
2014-01-03
2014-02-01
2014-02-01
2014-02-02
2014-02-02")

d$count <- ave(!is.na(d$Date), substr(d$Date, 1,7), FUN=sum)
d

substr(d$Date, 1,7)从d$Date中的字符串中提取前七个字符（即包含年份和月份的部分，例如2014-01）。结果用于ave()

中的分组

以下是data.table的解决方案：

library("data.table")
D <- fread(
"Date
2014-01-02
2014-01-02
2014-01-02
2014-01-03
2014-01-03
2014-02-01
2014-02-01
2014-02-02
2014-02-02")

D[, count:=.N, substr(Date, 1, 7)]
D

Answer 4

我已更改您发布的示例，以便考虑每个月的年份，因为您想单独计算它们（在您的评论中指定）：

df = read.table(text = "
Date
2014-01-02
2014-01-02
2014-01-02
2014-01-03
2015-01-03
2014-02-01
2014-02-01
2014-02-02
2015-02-02",
header=T)

library(lubridate)
library(dplyr)

df %>%
  mutate(Date = ymd(Date)) %>%     # update to a datetime variable (if needed)
  group_by(Month = month(Date),    # for each month and year
           Year = year(Date)) %>%
  mutate(N = n()) %>%              # count number of rows/appearances
  ungroup() %>%                    # forget the grouping
  select(-Month, -Year)            # remove help variables

# # A tibble: 9 x 2
#         Date     N
#       <date> <int>
# 1 2014-01-02     4
# 2 2014-01-02     4
# 3 2014-01-02     4
# 4 2014-01-03     4
# 5 2015-01-03     1
# 6 2014-02-01     3
# 7 2014-02-01     3
# 8 2014-02-02     3
# 9 2015-02-02     1

计算具体日期[R]

4 个答案: