grep:如何使用R中的通配符搜索我的数据

时间:2012-11-25 16:26:15

标签: r grep wildcard

我最近开始使用R.所以现在我正试图从中获取一些数据。但是,我得到的结果非常令人困惑。我有1961年至1963年的数据,每天的格式为1961-04-25。我创建了一个名为:date

的向量

因此,当我尝试使用grep搜索4月10日到5月21日之间的时间段并显示我使用此命令的日期时:

date[date >= grep("196.-04-10", date, value = TRUE) & 
       date <= grep("196.-05-21", date, value = TRUE)] 

我得到的结果有些令人困惑,因为它是制作3天的步骤而不是每天都给我...见下文。

[1] "1961-04-10" "1961-04-13" "1961-04-16" "1961-04-19" "1961-04-22" "1961-04-25" "1961-04-28" "1961-05-01" "1961-05-04" "1961-05-07" "1961-05-10"
[12] "1961-05-13" "1961-05-16" "1961-05-19" "1962-04-12" "1962-04-15" "1962-04-18" "1962-04-21" "1962-04-24" "1962-04-27" "1962-04-30" "1962-05-03"
[23] "1962-05-06" "1962-05-09" "1962-05-12" "1962-05-15" "1962-05-18" "1962-05-21" "1963-04-11" "1963-04-14" "1963-04-17" "1963-04-20" "1963-04-23"
[34] "1963-04-26" "1963-04-29" "1963-05-02" "1963-05-05" "1963-05-08" "1963-05-11" "1963-05-14" "1963-05-17" "1963-05-20"

2 个答案:

答案 0 :(得分:2)

我认为grep策略是错误的,但也许这样的事情会起作用......基本上,我正在计算一年中的日期(Julian日期,yday())并使用它为了比较。

z <- as.Date(c("1961-04-10","1961-04-11","1961-04-12",
               "1961-05-21","1961-05-22","1961-05-23",
               "1963-04-09","1963-04-12","1963-05-21","1963-05-22"))
library(lubridate)
z[yday(z)>=yday(as.Date("1961-04-10")) & yday(z)<=yday(as.Date("1961-05-21"))]
## [1] "1961-04-10" "1961-04-11" "1961-04-12" "1961-05-21" "1963-04-12"
## [6] "1963-05-21"yz <- year(z)

实际上,这个解决方案很容易闰年...... 更好(?):

yz <- year(z)
z[z>=as.Date(paste0(yz,"-04-10")) & z<=as.Date(paste0(yz,"-05-21"))]

(你一定要自己测试一下,我没有仔细测试过!)

答案 1 :(得分:1)

在这里使用变量的日期格式是最好的选择。

## set up some test data
datevar <- seq.Date(as.Date("1961-01-01"),as.Date("1963-12-31"),by="day")
test <- data.frame(date=datevar,id=1:(length(datevar)))
head(test)

## which looks like:
> head(test)
        date id
1 1961-01-01  1
2 1961-01-02  2
3 1961-01-03  3
4 1961-01-04  4
5 1961-01-05  5
6 1961-01-06  6

## find the date ranges you want
selectdates <-  
    (format(test$date,"%m") == "04" & as.numeric(format(test$date,"%d")) >= 10) |
    (format(test$date,"%m") == "05" & as.numeric(format(test$date,"%d")) <= 21)

## subset the original data
result <- test[selectdates,]

## which looks as expected:    
> result
          date  id
100 1961-04-10 100
101 1961-04-11 101
102 1961-04-12 102
103 1961-04-13 103
104 1961-04-14 104
105 1961-04-15 105
106 1961-04-16 106
107 1961-04-17 107
108 1961-04-18 108
109 1961-04-19 109
110 1961-04-20 110
111 1961-04-21 111
112 1961-04-22 112
113 1961-04-23 113
114 1961-04-24 114
115 1961-04-25 115
116 1961-04-26 116
117 1961-04-27 117
118 1961-04-28 118
119 1961-04-29 119
120 1961-04-30 120
121 1961-05-01 121
122 1961-05-02 122
123 1961-05-03 123
124 1961-05-04 124
125 1961-05-05 125
126 1961-05-06 126
127 1961-05-07 127
128 1961-05-08 128
129 1961-05-09 129
130 1961-05-10 130
131 1961-05-11 131
132 1961-05-12 132
133 1961-05-13 133
134 1961-05-14 134
135 1961-05-15 135
136 1961-05-16 136
137 1961-05-17 137
138 1961-05-18 138
139 1961-05-19 139
140 1961-05-20 140
141 1961-05-21 141
465 1962-04-10 465
...