过滤具有多个条件的数据框

时间:2011-07-27 14:47:37

标签: r dataframe

我希望在R中对数据框进行子集化。我的语法显然是错误的(即产生错误的结果)。

    data[i,]$m_cnt <- nrow(w_data[
        w_data$direction >= data[i,]$min_a &
        w_data$direction < data[i,]$max_a & 
        w_data$windspeed >= 3 & 
        w_data$windspeed < 15,
    ])/records;

类似的问题:Filtering a data.frame

w_data data.frame(为简洁而简化)由风速和风向时间序列数据组成。

time_stamp          windspeed    direction
2010-06-01 00:00    12.2          125
2010-06-03 02:50    17.4          312
2010-06-05 21:30    2.1           132
2010-06-12 15:10    7.8           71
2010-06-22 17:40    2.6           307
2010-06-30 03:20    5.1           310

上述R语句应该计算特定风向范围内的记录数,例如>=120°<135°,并且在某个风速范围内,在此示例中{{1 }和>=3m/s。然后将计数转换为所采用测量总数的百分比,因此上述示例应等于6 = 16.66%中的1个记录。然后将百分比记录到另一个具有以下结构的data.frame(数据)中:

<15m/s

我遇到的问题是所有百分比的总和不等于100%(这个例子确实如此,但不是我的脚本超过10,000个记录)。

我也经历过奇怪的结果,例如:

min_a    max_a    l_cnt    m_cnt    h_cnt
0        15       0        0        0
15       30       0        0        0
30       45       0        0        0 
45       60       0        0        0 
60       75       0        0.1666   0
75       90       0        0        0
90       105      0        0        0
105      120      0        0        0 
120      135      0.1666   0.1666   0
135      150      0        0        0
150      165      0        0        0
165      180      0        0        0
180      195      0        0        0 
195      210      0        0        0 
210      225      0        0        0
225      240      0        0        0
240      255      0        0        0
255      270      0        0        0 
270      285      0        0        0
285      300      0        0        0
300      315      0.1666   0.1666   0.1666
315      330      0        0        0 
330      345      0        0        0
345      360      0        0        0

产生总数:

    data[i,]$l_cnt <- nrow(w_data[
                                w_data$direction >= data[i,]$min_a &
                                w_data$direction < data[i,]$max_a &  
                                w_data$windspeed <= 3,
                          ])/records;

    data[i,]$m_cnt <- nrow(w_data[
                                w_data$direction >= data[i,]$min_a &
                                w_data$direction < data[i,]$max_a & 
                                w_data$windspeed <= 15,
                          ])/records;

    data[i,]$h_cnt <- nrow(w_data[
                                w_data$direction >= data[i,]$min_a &
                                w_data$direction < data[i,]$max_a & 
                                w_data$windspeed > 15,
                          ])/records;   

但是,如果我使用大于小于m_cnt计算的资格,即:

l_cnt    0,360637343 
m_cnt    0,187836625
h_cnt    0,811938959
total    1,360412926

我明白了:

    data[i,]$m_cnt <- nrow(w_data[
        w_data$direction >= data[i,]$min_a &
        w_data$direction < data[i,]$max_a & 
        w_data$windspeed >= 3 & 
        w_data$windspeed < 15,
    ])/records;

1 个答案:

答案 0 :(得分:3)

可能这很接近你想要的东西:

> # data
> w_data
  windspeed direction
1      12.2       125
2      17.4       312
3       2.1       132
4       7.8        71
5       2.6       307
6       5.1       310

> # grouping by cut
> w_data <- transform(w_data,
+                     dg = cut(direction, breaks=0:24*15),
+                     wg = cut(windspeed, breaks=c(0, 3, 15, Inf)))

> # now the data looks like:
> w_data
  windspeed direction        dg       wg
1      12.2       125 (120,135]   (3,15]
2      17.4       312 (300,315] (15,Inf]
3       2.1       132 (120,135]    (0,3]
4       7.8        71   (60,75]   (3,15]
5       2.6       307 (300,315]    (0,3]
6       5.1       310 (300,315]   (3,15]

> # tabulate and calculate the parcentage
> table(w_data$dg, w_data$wg) / nrow(w_data)

                (0,3]    (3,15]  (15,Inf]
  (0,15]    0.0000000 0.0000000 0.0000000
  (15,30]   0.0000000 0.0000000 0.0000000
  (30,45]   0.0000000 0.0000000 0.0000000
  (45,60]   0.0000000 0.0000000 0.0000000
  (60,75]   0.0000000 0.1666667 0.0000000
  (75,90]   0.0000000 0.0000000 0.0000000
  (90,105]  0.0000000 0.0000000 0.0000000
  (105,120] 0.0000000 0.0000000 0.0000000
  (120,135] 0.1666667 0.1666667 0.0000000
  (135,150] 0.0000000 0.0000000 0.0000000
  (150,165] 0.0000000 0.0000000 0.0000000
  (165,180] 0.0000000 0.0000000 0.0000000
  (180,195] 0.0000000 0.0000000 0.0000000
  (195,210] 0.0000000 0.0000000 0.0000000
  (210,225] 0.0000000 0.0000000 0.0000000
  (225,240] 0.0000000 0.0000000 0.0000000
  (240,255] 0.0000000 0.0000000 0.0000000
  (255,270] 0.0000000 0.0000000 0.0000000
  (270,285] 0.0000000 0.0000000 0.0000000
  (285,300] 0.0000000 0.0000000 0.0000000
  (300,315] 0.1666667 0.1666667 0.1666667
  (315,330] 0.0000000 0.0000000 0.0000000
  (330,345] 0.0000000 0.0000000 0.0000000
  (345,360] 0.0000000 0.0000000 0.0000000