如何获取所有行中共享和不共享的True和false的数量

时间:2019-05-20 14:36:27

标签: r

我有这样的数据

df<- structure(list(rowid = 1:12, P = c(TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE), T = c(TRUE, TRUE, 
TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE), 
    X = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, 
    FALSE, TRUE, TRUE)), row.names = c(NA, -12L), class = c("tbl_df", 
"tbl", "data.frame"))

当所有行均为True或False或混合时,我试图获取。

所以在这种情况下,就像这样

AllTure  AllFalse  Mixed 
9          0        2

4 个答案:

答案 0 :(得分:6)

使用dplyr,您可以执行以下操作:

df %>%
 summarise(AllTure = sum(rowSums(.[2:4]) == 3),
           AllFalse = sum(rowSums(.[2:4]) == 0),
           Mixed = n() - (AllFalse + AllTure))

  AllTure AllFalse Mixed
    <int>    <int> <int>
1       9        0     3

答案 1 :(得分:5)

一个选项是

Imports System.IO
Imports System.Data.SqlClient

Public Class Form1

    Dim CMD As New SqlCommand()
    Dim ADP As New SqlDataAdapter()
    Dim TBL As New DataTable

    Dim serverStr As String = "Server"
    Dim databaseStr As String = "Database"
    Dim dbLoginStr As String = "Login"
    Dim dbPassStr As String = "Password"

    Dim qry As String

    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click

        Dim connectionString As String =
            "Server=" & serverStr & ";" &
            "Database=" & databaseStr & ";" &
            "User Id=" & dbLoginStr & ";" &
            "Password=" & dbPassStr & ";" &
            "Timeout=1"

        qry = "SELECT * FROM INFORMATION_SCHEMA.TABLES ORDER BY TABLE_NAME ASC"

        Try
            Using SQL As New SqlConnection
                SQL.ConnectionString = connectionString

                CMD.Connection = SQL
                CMD.CommandText = qry
                ADP.SelectCommand = CMD

                SQL.Open()
                ADP.SelectCommand = CMD
                TBL.Clear()
                TBL.Columns.Clear()
                ADP.Fill(TBL)
                SQL.Close()
            End Using
        Catch ex As Exception
            MsgBox("Could not connect to " & databaseStr & " on server: " & serverStr & vbNewLine & vbNewLine & ex.Message)
            Exit Sub
        End Try

        DataGridView1.DataSource = TBL
        DataGridView1.AutoResizeColumns()
        SetRowNumber(DataGridView1)
        DataGridView1.Columns(DataGridView1.Columns.Count - 1).AutoSizeMode = DataGridViewAutoSizeColumnMode.Fill
    End Sub

    Private Sub SetRowNumber(myDGV As DataGridView)
        For Each row As DataGridViewRow In myDGV.Rows
            row.HeaderCell.Value = (row.Index + 1).ToString()
        Next
        myDGV.AutoResizeRowHeadersWidth(DataGridViewRowHeadersWidthSizeMode.AutoSizeToAllHeaders)

        'Remove Sorting ability
        'For Each col As DataGridViewColumn In myDGV.Columns
        '    col.SortMode = DataGridViewColumnSortMode.NotSortable
        'Next
    End Sub

    'See if value got cleared upon sorting
    'Private Sub DataGridView1_ColumnHeaderMouseClick(sender As Object, e As DataGridViewCellMouseEventArgs) Handles DataGridView1.ColumnHeaderMouseClick

    '    Dim myDGV = DirectCast(sender, DataGridView)

    '    For Each row As DataGridViewRow In myDGV.Rows
    '        MsgBox(row.HeaderCell.Value)
    '    Next
    'End Sub
End Class

为了获得预期的标签,我们可以将其转换为指定了table(rowSums(df[-1])) 的{​​{1}}

factor

注意:两种解决方案都仅使用levels


如果我们在s1 <- rowSums(df[-1]) table(factor(replace(s1, !s1 %in% c(0, 3), 1), levels = c(0, 1, 3), labels = c("AllFalse", "Mixed", "AllTrue"))) # AllFalse Mixed AllTrue # 0 3 9 中需要它,则该选项无需重塑或多次进行相同的计算,请使用base R获取行的总和,然后将“总和”列转换为{{1 }},其中指定了tidyverse,并使用reduce

获得了频率
factor

答案 2 :(得分:3)

使用cut的另一个选项是以适当的间隔创建breaks并相应地分配labels

table(cut(rowSums(df[-1]), breaks = c(-Inf,0, ncol(df) - 2, Inf),
      labels = c("AllFalse", "Mixed", "AllTrue")))

#AllFalse    Mixed  AllTrue 
#       0        3        9 

理想情况下,break的值应为ncol(df) - 1,在这里使用ncol(df) - 2是因为我们要忽略计算中的第一列。

答案 3 :(得分:2)

通常,我认为rowSums解决方案更好。但是我经常喜欢重塑长数据以使操作更加灵活,例如不对要累加的列进行硬编码。我也想扔在那里,您可能想将要匹配的列数(在本例中为3)保留在变量中,再次避免硬编码。灵活性的折衷是,这有点多余,可以通过两次调用summarise函数来实现。

library(dplyr)
library(tidyr)

n <- ncol(df) - 1
df %>%
  gather(key, value, -rowid) %>%
  group_by(rowid) %>%
  summarise(all_true = sum(sum(value) == n),
            all_false = sum(sum(value) == 0),
            mixed = sum(!sum(value) %in% c(0, n))) %>%
  summarise_at(vars(-rowid), sum)
#> # A tibble: 1 x 3
#>   all_true all_false mixed
#>      <int>     <int> <int>
#> 1        9         0     3