我有一个包含两个变量的数据框,如下所示:
df <- data.frame(group=c(1,1,1,2,2,3,3,4),
type=c("a","b","a", "b", "c", "c","b","a"))
> df
group type
1 1 a
2 1 b
3 1 a
4 2 b
5 2 c
6 3 c
7 3 b
8 4 a
我想生成一个表格,为每个组显示它在数据框中作为一个变量的类型组合,例如。
group alltypes
1 1 a, b
2 2 b, c
3 3 b, c
4 4 a
输出总是以相同的顺序列出类型(例如,组2和3得到相同的结果)并且不会重复(例如,组1不是“a,b,a”)。
我尝试使用dplyr进行此操作并总结,但我无法弄清楚如何使其满足这两个条件 - 我尝试的代码是:
> df %>%
+ group_by(group) %>%
+ summarise(
+ alltypes = paste(type, collapse=", ")
+ )
# A tibble: 4 × 2
group alltypes
<dbl> <chr>
1 1 a, b, a
2 2 b, c
3 3 c, b
4 4 a
我也尝试将类型转换为一组单独的计数,但不确定它是否真的有用:
> df %>%
+ group_by(group, type) %>%
+ tally %>%
+ spread(type, n, fill=0)
Source: local data frame [4 x 4]
Groups: group [4]
group a b c
* <dbl> <dbl> <dbl> <dbl>
1 1 2 1 0
2 2 0 1 1
3 3 0 1 1
4 4 1 0 0
任何建议都将不胜感激。
答案 0 :(得分:4)
我认为你很亲密。您可以调用unique
和df %>% group_by(group) %>%
summarize(type = paste(sort(unique(type)),collapse=", "))
函数,以确保您的结果符合您的条件,如下所示:
# A tibble: 4 x 2
group type
<int> <chr>
1 1 a, b
2 2 b, c
3 3 b, c
4 4 a
返回:
protected function _buildSchema(Schema $schema)
{
return $schema->addField('name', ['type' => 'string'])
->addField('line1', ['type' => 'string'])
->addField('city', ['type' => 'string'])
->addField('state', ['type' => 'string'])
->addField('country', ['type' => 'string'])
->addField('postal_code', ['type' => 'string'])
->addField('phone', ['type' => 'string'])
->addField('email', ['type' => 'string']);
}
/**
* Form validation builder
*
* @param \Cake\Validation\Validator $validator to use against the form
* @return \Cake\Validation\Validator
*/
protected function _buildValidator(Validator $validator)
{
return $validator->notEmpty('name')
->notEmpty('line1')
->notEmpty('city')
->notEmpty('state')
->notEmpty('country')
->notEmpty('postal_code')
->notEmpty('email')
->add('email', 'valid', ['rule' => 'email']);
}
protected function _execute(array $data)
{
return true;
}
public function setErrors($errors)
{
$this->_errors = $errors;
}
答案 1 :(得分:0)
为了扩展Florian的答案,可以将其扩展为根据数据集中的值生成有序列表。一个例子可能是确定日期的顺序:
library(lubridate)
library(tidyverse)
# Generate random dates
set.seed(123)
Date = ymd("2018-01-01") + sort(sample(1:200, 10))
A = ymd("2018-01-01") + sort(sample(1:200, 10))
B = ymd("2018-01-01") + sort(sample(1:200, 10))
C = ymd("2018-01-01") + sort(sample(1:200, 10))
# Combine to data set
data = bind_cols(as.data.frame(Date), as.data.frame(A), as.data.frame(B), as.data.frame(C))
# Get order of dates for each row
data %>%
mutate(D = Date) %>%
gather(key = Var, value = D, -Date) %>%
arrange(Date, D) %>%
group_by(Date) %>%
summarize(Ord = paste(Var, collapse=">"))
与原始问题有些相似但希望对某人有帮助。