Question

我是R的新手，所以请指导我。

下面显示的是一个名为Order的简单表格。

   Col1   Col2    Col3

  hey    hi   july 12,2013
  hey    hi   june 12,2013
  hey    hi   April 12,2013
  hey    hi   April 14,2012

如果我想编写一个查询，以便在新表即中得到这个结果，我需要使用正则表达式来匹配Col3中字符串的一部分，数数。

 july     june   April
  1         1      2

如果有人知道怎么做，请帮助我。

Answer 1

您可以使用sub提取月份名称，table来计算频率：

dat <- read.table(text = "Col1   Col2    Col3
hey    hi   'july 12,2013'
hey    hi   'june 12,2013'
hey    hi   'April 12,2013'
hey    hi   'April 14,2012'", header = TRUE)

table(sub("^(\\w+) .*", "\\1", dat$Col3))

# April  july  june 
#     2     1     1

sub("^(\\w+) .*", "\\1", dat$Col3)如何运作？

函数sub执行字符串替换。引号内的字符串是正则表达式。 ^是字符串的开头，\\w是单词字符，+表示一个或多个字符。是一个字面空间。 .*表示任意数量的任何字符。括号用于创建组。第一个（也是唯一一个）组(\\w+)匹配字符串开头的单词字符。 sub函数中的第二个参数"\\1"用于将整个字符串替换为表示第一个组的子字符串。简而言之：整个字符串被第一个单词替换。

Answer 2

数据：

data <- read.table(text = "Col1   Col2    Col3
hey    hi   'july 12,2013'
hey    hi   'june 12,2013'
hey    hi   'April 12,2013'
hey    hi   'April 14,2012'", header = TRUE)

使用日期的答案：

    #tranform data in POSIXlt    
    data$Col3 <- as.POSIXlt(data$Col3, format="%B %d, %Y")

    ## group using table with POSIXlt numbers (0 is january)
    table(data$Col3$mon)
    3 5 6 
    2 1 1 

    ## group using table with normal month numbers
    table(month(data$Col3))
    4 6 7 
    2 1 1

    ## group using aggregate with POSIXlt numbers (0 is january) 
    aggregate(data$Col1, by=list(data[,"Col3"]$mon), length)

    #result
    Group.1 x
    1       3 2
    2       5 1
    3       6 1

    ## group using aggregate with normal month numbers 
    aggregate(data$Col1, by=list(month(data$Col3)), length)

    #result
  Group.1 x
1       4 2
2       6 1
3       7 1

PS：当你得到数据时，POSIXlt 1月的数据是$ Col3 $ mon是0，所以四月是3而不是你所期望的4。要获得“正常”月份数字，您应该使用月份（数据$ Col3） - 只是意识到阅读Ananda的评论。

如果你想要一个更漂亮的版本（Ananda Mahto）：

    Col3 <- as.POSIXlt(data$Col3, format="%B %d, %Y"); table(month.name[month(Col3)])

    April  July  June 
      2     1     1

将R中的特定行添加到新表中

2 个答案: