我没有找到任何方法来检查向量的分类值元素是否在其他分类值元素之间。 给出了一个数据框:
id letter
1 B
2 A
3 B
4 B
5 C
6 B
7 A
8 B
9 C
我发现的所有内容都与数值和一般顺序的概念有关(而不是与特定向量中元素的索引有关)。
我想向数据帧添加一个布尔值(如果B在A和C之间,则为1;如果B在C和A之间,则为0)的新列,
id letter between
1 B 0
2 A NA
3 B 1
4 B 1
5 C NA
6 B 0
7 A NA
8 B 1
9 C NA
答案 0 :(得分:1)
import requests
data = ' { "query": { "match_all": {} } }'
headers = {'Content-Type': 'application/json'}
response = requests.post('http://my_host.net:9200/_search', data=data, headers=headers)
(行程编码)和rle
的组合是一种选择:
zoo::rollapply
说明
library(zoo)
d <- structure(list(id = 1:9,
letter = structure(c(2L, 1L, 2L, 2L, 3L, 2L, 1L, 2L, 3L),
.Label = c("A", "B", "C"),
class = "factor")),
class = "data.frame", row.names = c(NA, -9L))
rl <- rle(as.numeric(d$letter))
rep(rollapply(c(NA, rl$values, NA),
3,
function(x) if (x[2] == 2)
ifelse(x[1] == 1 && x[3] == 3, 1, 0)
else NA),
rl$lengths)
# [1] 0 NA 1 1 NA 0 NA 1 NA
,您可以标识连续值的块。rle
,您可以将具有给定窗口大小(此处为3)的函数“滚动”到向量上。rollapply
包含不同的元素,我们应用到它的函数非常简单:
rl$values
),则返回B
NA
,元素3是A
,则返回1,否则返回0。答案 1 :(得分:1)
从这个问题尚不清楚“ A”和“ C”是否必须交替,尽管这是隐含的,因为“ A”和“ A”或vv之间没有“ B”的编码。假设他们为向量做了
x = c("B", "A", "B", "B", "C", "B", "A", "B", "C")
映射到数值c(A=1, B=0, C=-1)
并形成累加和
v = cumsum(c(A=1, B=0, C=-1)[x])
(遇到“ A”时加1,当“ C”时减1)。用NA
v[x != "B"] = NA
给予
> v
B A B B C B A B C
0 NA 1 1 NA 0 NA 1 NA
这可以作为功能捕获
fun = function(x, map = c(A = 1, B = 0, C = -1)) {
x = map[x]
v = cumsum(x)
v[x != 0] = NA
v
}
并用于转换data.frame或tibble,例如
tibble(x) %>% mutate(v = fun(x))
答案 2 :(得分:0)
另一种tidyverse
可能性是:
df %>%
group_by(grp = with(rle(letter), rep(seq_along(lengths), lengths))) %>%
filter(row_number() == 1) %>%
ungroup() %>%
mutate(res = ifelse(lag(letter, default = first(letter)) == "A" &
lead(letter, default = last(letter)) == "C", 1, 0)) %>%
select(-letter, -grp) %>%
full_join(df, by = c("id" = "id")) %>%
arrange(id) %>%
fill(res) %>%
mutate(res = ifelse(letter != "B", NA, res))
id res letter
<int> <dbl> <chr>
1 1 0 B
2 2 NA A
3 3 1 B
4 4 1 B
5 5 NA C
6 6 0 B
7 7 NA A
8 8 1 B
9 9 NA C
在这种情况下,首先,它按游程长度类型ID进行分组,并保留具有给定ID的前几行。其次,它检查条件。第三,它使用“ id”列上的原始df执行完全连接。最后,它根据“ id”进行排列,填充缺失值,并将NA分配给“ letter”!= B的行。
答案 3 :(得分:0)
这是一个解决方案,我希望从概念上讲很容易。对于“特殊”情况,例如B在列表的顶部或底部,或者两边都有A或C,我将此类值设置为0。
# Create dummy data - you use your own
df <- data.frame(id=1:100, letter=sample(c("A", "B", "C"), 100, replace=T))
# Copy down info on whether A or C is above each B
acup <- df$letter
for(i in 2:nrow(df))
if(df$letter[i] == "B")
acup[i] <- acup[i-1]
# Copy up info on whether A or C is below each B
acdown <- df$letter
for(i in nrow(df):2 -1)
if(df$letter[i] == "B")
acdown[i] <- acdown[i+1]
# Set appropriate values for column 'between'
df$between <- NA
df$between[acup == "A" & acdown == "C"] <- 1
df$between[df$letter == "B" & is.na(df$between)] <- 0 # Includes special cases
答案 4 :(得分:-1)
您可以使用lead
和lag
函数来了解mutate
之前和之后的字母,如下所示:
library(dplyr)
df %>%
mutate(letter_lag = lag(letter, 1),
letter_lead = lead(letter, 1)) %>%
mutate(between = case_when(letter_lag == "A" | letter_lead == "C" ~ 1,
letter_lag == "C" | letter_lead == "A" ~ 0,
TRUE ~ NA_real_)) %>%
select(id, letter, between)
id letter between
1 1 B 0
2 2 A NA
3 3 B 1
4 4 B 1
5 5 C NA
6 6 B 0
7 7 A NA
8 8 B 1
9 9 C NA