循环使用列值来捕获非零值

时间:2018-05-29 06:34:27

标签: r for-loop

我正试图在R中写一个令我不安的小for循环。
我的数据具有以下结构(包含数千条记录):

       City  Street    Time  Name Value
1  New York Street1  Week 1  John     0
2  New York Street1  Week 2  John     0
3  New York Street1  Week 3 James     0
4  New York Street1  Week 3 James     5
5  New York Street2  Week 4  Kate     0
6  New York Street2  Week 4  Kate     3
7  New York Street4  Week 7  Kate     0
8  New York Street4  Week 8  Kate     0
9  New York Street4  Week 9  John     0
10   Boston Street1  Week 1 James     0
11   Boston Street1  Week 2 James     0
12   Boston Street1  Week 3  John     0
13   Boston Street1  Week 4  Kate     0
14   Boston Street1  Week 5  John     0
15   Boston Street1  Week 6  Kate     0
16   Boston Street1  Week 7  Kate     0
17   Boston Street1  Week 8 James     0
18   Boston Street1  Week 9 James     0
19   Boston Street1 Week 10  Kate     2
20   Boston Street5 Week 11  John     0
21   Boston Street5 Week 12  Kate     3
22   Boston Street5 Week 13  Kate     0

我试图找到每个城市/街道组合中非零值的第一周,然后在此次发生之前删除该特定城市/街道组合的所有名称,然后转到下一个城市/街道组合。

我在想我的输出应该是这样的。

       City  Street    Time  Name Value
1  New York Street1  Week 1     -     0
2  New York Street1  Week 2     -     0
3  New York Street1  Week 3     -     0
4  New York Street1  Week 3 James     5
5  New York Street2  Week 4     -     0
6  New York Street2  Week 4  Kate     3
7  New York Street4  Week 7  Kate     0
8  New York Street4  Week 8  Kate     0
9  New York Street4  Week 9  John     0
10   Boston Street1  Week 1     -     0
11   Boston Street1  Week 2     -     0
12   Boston Street1  Week 3     -     0
13   Boston Street1  Week 4     -     0
14   Boston Street1  Week 5     -     0
15   Boston Street1  Week 6     -     0
16   Boston Street1  Week 7     -     0
17   Boston Street1  Week 8     -     0
18   Boston Street1  Week 9     -     0
19   Boston Street1 Week 10  Kate     2
20   Boston Street5 Week 11     -     0
21   Boston Street5 Week 12  Kate     3
22   Boston Street5 Week 13  Kate     0

我尝试过一个简单的for循环,但是循环遍历行号而没有城市/街道名称。

你能帮忙吗?

数据

 my_data <- 
structure(list(City = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Boston", 
"New York"), class = "factor"), Street = structure(c(1L, 1L, 
1L, 1L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 4L, 4L, 4L), .Label = c("Street1", "Street2", "Street4", 
"Street5"), class = "factor"), Time = structure(c(1L, 6L, 7L, 
7L, 8L, 8L, 11L, 12L, 13L, 1L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 
13L, 2L, 3L, 4L, 5L), .Label = c("Week 1", "Week 10", "Week 11", 
"Week 12", "Week 13", "Week 2", "Week 3", "Week 4", "Week 5", 
"Week 6", "Week 7", "Week 8", "Week 9"), class = "factor"), Name = structure(c(2L, 
2L, 1L, 1L, 3L, 3L, 3L, 3L, 2L, 1L, 1L, 2L, 3L, 2L, 3L, 3L, 1L, 
1L, 3L, 2L, 3L, 3L), .Label = c("James", "John", "Kate"), class = "factor"), 
    Value = c(0L, 0L, 0L, 5L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 3L, 0L)), class = "data.frame", row.names = c(NA, 
-22L))

    expected_output <- 
structure(list(City = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Boston", 
"New York"), class = "factor"), Street = structure(c(1L, 1L, 
1L, 1L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 4L, 4L, 4L), .Label = c("Street1", "Street2", "Street4", 
"Street5"), class = "factor"), Time = structure(c(1L, 6L, 7L, 
7L, 8L, 8L, 11L, 12L, 13L, 1L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 
13L, 2L, 3L, 4L, 5L), .Label = c("Week 1", "Week 10", "Week 11", 
"Week 12", "Week 13", "Week 2", "Week 3", "Week 4", "Week 5", 
"Week 6", "Week 7", "Week 8", "Week 9"), class = "factor"), Name = structure(c(2L, 
2L, 1L, 1L, 3L, 3L, 3L, 3L, 2L, 1L, 1L, 2L, 3L, 2L, 3L, 3L, 1L, 
1L, 3L, 2L, 3L, 3L), .Label = c("James", "John", "Kate"), class = "factor"), 
    Value = c(0L, 0L, 0L, 5L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 3L, 0L)), class = "data.frame", row.names = c(NA, 
-22L))

1 个答案:

答案 0 :(得分:0)

使用data.table,可以将data.frame转换为data.tablesetDT(my_data)),将“名称”转换为character类(如果它需要factor,然后我们需要在分配之前将-指定为“名称”的levels之一。按“城市”,“街道”分组,获取行索引,其中(Value == 0ifany'值',该组中不为零,获取行索引(.I)逻辑向量(Value != 0)的累积和小于1,并为这些行分配' - '

library(data.table)
setDT(my_data)[, Name := as.character(Name)]
i1 <-  my_data[, if(any(Value !=0)) .I[cumsum(Value !=0) < 1] , 
                  .(City, Street)]$V1
my_data[i1, Name := '-']
#         City  Street    Time  Name Value
# 1: New York Street1  Week 1     -     0
# 2: New York Street1  Week 2     -     0
# 3: New York Street1  Week 3     -     0
# 4: New York Street1  Week 3 James     5
# 5: New York Street2  Week 4     -     0
# 6: New York Street2  Week 4  Kate     3
# 7: New York Street4  Week 7  Kate     0
# 8: New York Street4  Week 8  Kate     0
# 9: New York Street4  Week 9  John     0
#10:   Boston Street1  Week 1     -     0
#11:   Boston Street1  Week 2     -     0
#12:   Boston Street1  Week 3     -     0
#13:   Boston Street1  Week 4     -     0
#14:   Boston Street1  Week 5     -     0
#15:   Boston Street1  Week 6     -     0
#16:   Boston Street1  Week 7     -     0
#17:   Boston Street1  Week 8     -     0
#18:   Boston Street1  Week 9     -     0
#19:   Boston Street1 Week 10  Kate     2
#20:   Boston Street5 Week 11     -     0
#21:   Boston Street5 Week 12  Kate     3
#22:   Boston Street5 Week 13  Kate     0

tidyverse的类似选项是

library(tidyverse)
my_data %>% 
    mutate(Name = as.character(Name)) %>% 
    group_by(City, Street) %>% 
    mutate(Name = if(any(Value!=0)) 
            replace(Name, cumsum(Value != 0) < 1, '-') else Name)
# A tibble: 22 x 5
# Groups:   City, Street [5]
#   City     Street  Time   Name  Value
#   <fct>    <fct>   <fct>  <chr> <int>
# 1 New York Street1 Week 1 -         0
# 2 New York Street1 Week 2 -         0
# 3 New York Street1 Week 3 -         0
# 4 New York Street1 Week 3 James     5
# 5 New York Street2 Week 4 -         0
# 6 New York Street2 Week 4 Kate      3
# 7 New York Street4 Week 7 Kate      0
# 8 New York Street4 Week 8 Kate      0
# 9 New York Street4 Week 9 John      0
#10 Boston   Street1 Week 1 -         0
# ... with 12 more rows