data.table中所有因子变量的表函数

时间:2017-05-23 17:37:16

标签: r data.table

我有一个数据集:

    <?php
    $criteria = $_GET["criteria"];
    $Field = $_GET["animal"];
    $link = mysql_connect('127.0.0.1', 'criteria', 'animal');
    if (!$link) {
        die('Could not connect: ' . mysql_error());
    }
    echo 'Connected successfully';
    $result = array(connection -> query("SELECT * FROM animal WHERE $Field Like %criteria%"));
    mysql_close($link);
    echo $result;
?>

我想使用str(df) Classes ‘data.table’ and 'data.frame': 3000 obs. of 12 variables: $ year : int 2006 2004 2003 2003 2005 2008 2009 2008 2006 2004 ... $ age : int 18 24 45 43 50 54 44 30 41 52 ... $ sex : Factor w/ 2 levels "1. Male","2. Female": 1 1 1 1 1 1 1 1 1 1 ... $ maritl : Factor w/ 5 levels "1. Never Married",..: 1 1 2 2 4 2 2 1 1 2 ... $ race : Factor w/ 4 levels "1. White","2. Black",..: 1 1 1 3 1 1 4 3 2 1 ... $ education : Factor w/ 5 levels "1. < HS Grad",..: 1 4 3 4 2 4 3 3 3 2 ... $ region : Factor w/ 9 levels "1. New England",..: 2 2 2 2 2 2 2 2 2 2 ... $ jobclass : Factor w/ 2 levels "1. Industrial",..: 1 2 1 2 2 2 1 2 2 2 ... $ health : Factor w/ 2 levels "1. <=Good","2. >=Very Good": 1 2 1 2 1 2 2 1 2 2 ... $ health_ins: Factor w/ 2 levels "1. Yes","2. No": 2 2 1 1 1 1 1 1 1 1 ... $ logwage : num 4.32 4.26 4.88 5.04 4.32 ... $ wage : num 75 70.5 131 154.7 75 ... 为每个因子变量使用table函数。

我的尝试: data.table 但它不起作用

1 个答案:

答案 0 :(得分:2)

这将返回一个命名的表列表,其名称对应于数据集中的每个因子变量。我在下面提供了一些示例数据。

这里,lapply遍历data.table,该子集只包含作为因子的变量并构造每个变量的表。

lapply(dt[, .SD, .SDcols=names(dt)[sapply(dt, is.factor)]], table)
$origin

A B C E 
2 1 2 1 

$destination

B C D E F 
2 1 1 1 1 

@ mt1022建议使用两种替代语法,即简洁

lapply(dt[, .SD, .SDcols = sapply(dt, is.factor)], table)

并使用基本R语法(将with=FALSE设置为直接将逻辑矢量上的列表元素子集化)。

lapply(dt[, sapply(dt, is.factor), with = F], table)

数据

dt <- 
structure(list(id = c(1L, 1L, 2L, 3L, 3L, 3L), origin = structure(c(1L, 
3L, 1L, 2L, 4L, 3L), .Label = c("A", "B", "C", "E"), class = "factor"), 
    destination = structure(c(1L, 3L, 1L, 4L, 2L, 5L), .Label = c("B", 
    "C", "D", "E", "F"), class = "factor"), price = c(2L, 2L, 
    3L, 6L, 6L, 6L)), .Names = c("id", "origin", "destination", 
"price"), row.names = c(NA, -6L), class = c("data.table", "data.frame"
))