我正试图在R中写一个令我不安的小for循环。
我的数据具有以下结构(包含数千条记录):
City Street Time Name Value
1 New York Street1 Week 1 John 0
2 New York Street1 Week 2 John 0
3 New York Street1 Week 3 James 0
4 New York Street1 Week 3 James 5
5 New York Street2 Week 4 Kate 0
6 New York Street2 Week 4 Kate 3
7 New York Street4 Week 7 Kate 0
8 New York Street4 Week 8 Kate 0
9 New York Street4 Week 9 John 0
10 Boston Street1 Week 1 James 0
11 Boston Street1 Week 2 James 0
12 Boston Street1 Week 3 John 0
13 Boston Street1 Week 4 Kate 0
14 Boston Street1 Week 5 John 0
15 Boston Street1 Week 6 Kate 0
16 Boston Street1 Week 7 Kate 0
17 Boston Street1 Week 8 James 0
18 Boston Street1 Week 9 James 0
19 Boston Street1 Week 10 Kate 2
20 Boston Street5 Week 11 John 0
21 Boston Street5 Week 12 Kate 3
22 Boston Street5 Week 13 Kate 0
我试图找到每个城市/街道组合中非零值的第一周,然后在此次发生之前删除该特定城市/街道组合的所有名称,然后转到下一个城市/街道组合。
我在想我的输出应该是这样的。
City Street Time Name Value
1 New York Street1 Week 1 - 0
2 New York Street1 Week 2 - 0
3 New York Street1 Week 3 - 0
4 New York Street1 Week 3 James 5
5 New York Street2 Week 4 - 0
6 New York Street2 Week 4 Kate 3
7 New York Street4 Week 7 Kate 0
8 New York Street4 Week 8 Kate 0
9 New York Street4 Week 9 John 0
10 Boston Street1 Week 1 - 0
11 Boston Street1 Week 2 - 0
12 Boston Street1 Week 3 - 0
13 Boston Street1 Week 4 - 0
14 Boston Street1 Week 5 - 0
15 Boston Street1 Week 6 - 0
16 Boston Street1 Week 7 - 0
17 Boston Street1 Week 8 - 0
18 Boston Street1 Week 9 - 0
19 Boston Street1 Week 10 Kate 2
20 Boston Street5 Week 11 - 0
21 Boston Street5 Week 12 Kate 3
22 Boston Street5 Week 13 Kate 0
我尝试过一个简单的for循环,但是循环遍历行号而没有城市/街道名称。
你能帮忙吗?
数据
my_data <-
structure(list(City = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Boston",
"New York"), class = "factor"), Street = structure(c(1L, 1L,
1L, 1L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 4L, 4L, 4L), .Label = c("Street1", "Street2", "Street4",
"Street5"), class = "factor"), Time = structure(c(1L, 6L, 7L,
7L, 8L, 8L, 11L, 12L, 13L, 1L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 2L, 3L, 4L, 5L), .Label = c("Week 1", "Week 10", "Week 11",
"Week 12", "Week 13", "Week 2", "Week 3", "Week 4", "Week 5",
"Week 6", "Week 7", "Week 8", "Week 9"), class = "factor"), Name = structure(c(2L,
2L, 1L, 1L, 3L, 3L, 3L, 3L, 2L, 1L, 1L, 2L, 3L, 2L, 3L, 3L, 1L,
1L, 3L, 2L, 3L, 3L), .Label = c("James", "John", "Kate"), class = "factor"),
Value = c(0L, 0L, 0L, 5L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 3L, 0L)), class = "data.frame", row.names = c(NA,
-22L))
expected_output <-
structure(list(City = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Boston",
"New York"), class = "factor"), Street = structure(c(1L, 1L,
1L, 1L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 4L, 4L, 4L), .Label = c("Street1", "Street2", "Street4",
"Street5"), class = "factor"), Time = structure(c(1L, 6L, 7L,
7L, 8L, 8L, 11L, 12L, 13L, 1L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 2L, 3L, 4L, 5L), .Label = c("Week 1", "Week 10", "Week 11",
"Week 12", "Week 13", "Week 2", "Week 3", "Week 4", "Week 5",
"Week 6", "Week 7", "Week 8", "Week 9"), class = "factor"), Name = structure(c(2L,
2L, 1L, 1L, 3L, 3L, 3L, 3L, 2L, 1L, 1L, 2L, 3L, 2L, 3L, 3L, 1L,
1L, 3L, 2L, 3L, 3L), .Label = c("James", "John", "Kate"), class = "factor"),
Value = c(0L, 0L, 0L, 5L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 3L, 0L)), class = "data.frame", row.names = c(NA,
-22L))
答案 0 :(得分:0)
使用data.table
,可以将data.frame
转换为data.table
(setDT(my_data)
),将“名称”转换为character
类(如果它需要factor
,然后我们需要在分配之前将-
指定为“名称”的levels
之一。按“城市”,“街道”分组,获取行索引,其中(Value == 0
)if
有any
'值',该组中不为零,获取行索引(.I
)逻辑向量(Value != 0
)的累积和小于1,并为这些行分配' - '
library(data.table)
setDT(my_data)[, Name := as.character(Name)]
i1 <- my_data[, if(any(Value !=0)) .I[cumsum(Value !=0) < 1] ,
.(City, Street)]$V1
my_data[i1, Name := '-']
# City Street Time Name Value
# 1: New York Street1 Week 1 - 0
# 2: New York Street1 Week 2 - 0
# 3: New York Street1 Week 3 - 0
# 4: New York Street1 Week 3 James 5
# 5: New York Street2 Week 4 - 0
# 6: New York Street2 Week 4 Kate 3
# 7: New York Street4 Week 7 Kate 0
# 8: New York Street4 Week 8 Kate 0
# 9: New York Street4 Week 9 John 0
#10: Boston Street1 Week 1 - 0
#11: Boston Street1 Week 2 - 0
#12: Boston Street1 Week 3 - 0
#13: Boston Street1 Week 4 - 0
#14: Boston Street1 Week 5 - 0
#15: Boston Street1 Week 6 - 0
#16: Boston Street1 Week 7 - 0
#17: Boston Street1 Week 8 - 0
#18: Boston Street1 Week 9 - 0
#19: Boston Street1 Week 10 Kate 2
#20: Boston Street5 Week 11 - 0
#21: Boston Street5 Week 12 Kate 3
#22: Boston Street5 Week 13 Kate 0
tidyverse
的类似选项是
library(tidyverse)
my_data %>%
mutate(Name = as.character(Name)) %>%
group_by(City, Street) %>%
mutate(Name = if(any(Value!=0))
replace(Name, cumsum(Value != 0) < 1, '-') else Name)
# A tibble: 22 x 5
# Groups: City, Street [5]
# City Street Time Name Value
# <fct> <fct> <fct> <chr> <int>
# 1 New York Street1 Week 1 - 0
# 2 New York Street1 Week 2 - 0
# 3 New York Street1 Week 3 - 0
# 4 New York Street1 Week 3 James 5
# 5 New York Street2 Week 4 - 0
# 6 New York Street2 Week 4 Kate 3
# 7 New York Street4 Week 7 Kate 0
# 8 New York Street4 Week 8 Kate 0
# 9 New York Street4 Week 9 John 0
#10 Boston Street1 Week 1 - 0
# ... with 12 more rows