如何过滤至少一列大于阈值的行?

时间:2019-03-28 03:33:52

标签: r filter dplyr subset

我有数据

TaskGroup

我想过滤至少一列大于0.50的行。

我正在尝试以下命令:

class TaskGroupType(models.Model): name = models.CharField(max_length=58, null=True) def __str__(self): return self.name class Project(models.Model): created = models.DateTimeField(auto_now_add=True) owner = models.ForeignKey( UserProjectOwners, null=True, blank=True, on_delete=models.CASCADE, related_name='Owner' ) name = models.CharField(max_length=100) desc = models.CharField(max_length=200, null=True, blank=True) category = models.CharField(max_length=100, null=True, blank=True) members = models.ForeignKey( UserProjectTeam, null=True, blank=True, on_delete=models.CASCADE, related_name="project" ) tasktype = models.ManyToManyField(TaskType) class Meta: ordering =['created'] verbose_name = "User Table" verbose_name_plural = verbose_name def __str__(self): return self.name class TaskGroup(models.Model): created = models.DateTimeField(auto_now_add=True) name = models.CharField(max_length=280, blank=True) order = models.IntegerField(null=True, blank=True) project = models.ForeignKey( Project, related_name='taskgroups', null=True, blank=True, on_delete=models.CASCADE ) class Meta: ordering =['created'] verbose_name = "Task Group" verbose_name_plural = verbose_name def __str__(self): return self.name class Task(models.Model): SORT_TYPE = ( (1, "Normal"), (2, "Urgent"), (3, "Very Urgent"), ) createDate = models.DateTimeField(auto_now_add=True) tasklist = models.ForeignKey( TaskList, related_name='tasks', null=True, blank=True, on_delete=models.CASCADE ) completed = models.BooleanField(default=False) accomplished = models.DateTimeField(null=True, blank=True) desc = models.CharField(max_length=380, blank=True) name = models.CharField(max_length=180, blank=True) performer = models.ForeignKey( User, related_name='Task', null=True, blank=True, on_delete=models.CASCADE ) participant = models.ManyToManyField( User, related_name='+' ) startDate = models.DateTimeField(null=True, blank=True) dueDate = models.DateTimeField(null=True, blank=True) priority = models.CharField(max_length=100, choices=SORT_TYPE, null=True, blank=True) order = models.IntegerField(null=True, blank=True) remark = models.CharField(max_length=400, null=True, blank=True) class Meta: ordering =['createDate'] verbose_name = "Task Table" verbose_name_plural = verbose_name def __str__(self): return self.name

我收到以下警告,但没有任何输出:

Name    Clust1     Clust2     Clust3
AA    0.0662421  0.01742827 0.02286026
BB    0.7694628  0.03241972 0.02935754
CC    0.1099033  0.52170750 0.28385905
DD    0.2769453  0.30376152 0.24822205

我希望以下数据框:

new.df <- df %>% mutate(confident = ifelse(rowSums(.[,c(1:4)] >= 0.5)>0, 'yes', 'no'))

您是否有办法修正我的代码以获得所需的输出。 谢谢

1 个答案:

答案 0 :(得分:1)

我们可以直接使用rowSums

df[rowSums(df[2:4] >= 0.5) > 0, ]

#  Name  Clust1  Clust2   Clust3
#2   BB 0.76946 0.03242 0.029358
#3   CC 0.10990 0.52171 0.283859

或带有dplyrfilter_at的{​​{1}}版本

any_vars

并且就@thelatemail提到的代码修复而言,您将library(dplyr) df %>% filter_at(vars(starts_with("Clust")), any_vars(. >= 0.5)) 的第1列包括在rowSums列中,因此您希望将其子集放在{{1}列中}。同样,我们可以直接使用Name而不是使用2:4创建新变量,因此以下操作应该有效。

filter

我们还可以使用mutate版本,这对于较大的数据集来说会很慢

df %>% filter(rowSums(.[,c(2:4)] >= 0.5) > 0)