我想按城市汇总销售数据并添加一个列,其中" 1"意味着"这个城市至少有一家商店的销售量超过100"和" 0"意味着"否则"。
这是我尝试的近似值:
library(dplyr)
my_data <- tibble(city = c("City 1","City 1","City 2","City 1"),
store = c("mc donalds","starbucks","target","jp licks"),
sales = c(300,200,3000,80),
sales_higher_than_100 = c(1,1,1,0))
my_data %>%
group_by(city) %>%
summarise(has_stores_that_sell_more_than_100 = sum(sales_higher_than_100))
# A tibble: 2 x 2
city has_stores_that_sell_more_than_100
<chr> <dbl>
1 City 1 2
2 City 2 1
但是,我不想总结,而是为#34; City 1&#34;报告值1。像这样:
# A tibble: 2 x 2
city has_stores_that_sell_more_than_100
<chr> <dbl>
1 City 1 1
2 City 2 1
换句话说,我想知道如何指示dplyr查找City N的一行或多行是否符合条件而不是计算满足City N条件的每一行。
答案 0 :(得分:2)
可以在其中放置逻辑“&gt; 0”,然后将其转换为数字,其中TRUE = 1且FALSE = 0
my_data %>%
group_by(city) %>%
summarise(has_stores_that_sell_more_than_100 = as.numeric(sum(sales_higher_than_100)>0))
答案 1 :(得分:1)
使用ifelse
:
my_data %>%
mutate(res = ifelse(sales>= 100), 1,
ifelse(sales<100, 0, NA)))
使用if_else:
my_data %>%
mutate(res = if_else(sales>= 100), 1,
if_else(sales<100, 0, NA_real_)))
使用basic r
:
在您拥有销售价值数据框的最终数据框之后,
my_data$newcolumn<- my_data$sales[my_data$sales >= 100] <- 1
使用case_when
:
my_data %>%
select(name:sales) %>%
mutate(
type = case_when(
sales > 100 ~ "True",
TRUE ~ "False"
)
)
见这里:http://dplyr.tidyverse.org/reference/case_when.html
使用derived factor
:
library(dplyr)
library(mosaic)
#if salesvolume is the column that has number of sales
my_data <- mutate(my_data, res= as.numeric(derivedFactor(
"1" = (salesvolume >= 100)),
"0" = (salesvolume < 100),
.method = "first",
.default = NA
)))