将数据帧存储到列表

时间:2017-03-22 16:19:47

标签: r list dataframe crosstab

我正在尝试从基于原始数据生成的数据帧列表中交叉制表表。原始数据帧由8个变量的1004个观测值组成。

> summary(mydata)
 ARTICLE      COMPSYNT2   POSITION      COMPTYPE   SUBSTYPE2    VARIANT   
 No :832   NP      :342   Fin :535   Hum    :435   Comp :  2   Lieu :504  
 Yes:172   DetPoss :334   Init:284   SoA    :232   Conc : 12   Place:500  
           NFClau  :238   Med :185   Conc   :160   Contr:426              
           SubClau : 30              Abstr  :102   Dep  : 45              
           ProForm : 20              Prop   : 31   Emp  :104              
           PronPers: 16              Plant  : 17   Repl :327              
           (Other) : 24              (Other): 27   Subst: 88              
     DECADE           GENRE2   
 Min.   :1500   Treat_Ess:299  
 1st Qu.:1618   Novel    :219  
 Median :1650   Drama    :143  
 Mean   :1644   Poetry   : 86  
 3rd Qu.:1680   Memoirs  : 82  
 Max.   :1710   Corresp  : 81  
                (Other)  : 94 

我想为变量“DECADE”的每个级别获取一个数据帧(有21年)。我做了以下事情:

> mydata.split<-split(mydata, mydata$DECADE)

# remove the column "DECADE" since no longer needed

> mydata.split<-lapply(mydata.split, function(x) x=x[, -7]) 

    > french.split[1:2] # outputs the first two elements of the list
    $`1500`
      ARTICLE COMPSYNT2 POSITION COMPTYPE SUBSTYPE2 VARIANT GENRE2
    1      No    NFClau      Med      SoA     Contr    Lieu Poetry

    $`1510`
      ARTICLE COMPSYNT2 POSITION COMPTYPE SUBSTYPE2 VARIANT    GENRE2
    2      No  PronPers      Fin      Hum     Contr    Lieu     Novel
    3      No        NP     Init      Hum      Repl    Lieu     Novel
    4      No        NP     Init     Conc     Subst    Lieu Treat_Ess

然后,为了在变量“VARIANT”的函数中对列表的每个数据帧进行交叉制表,我分别访问了每个表,并在for循环中对列1“n使用列”VARIANT“应用交叉制表然后尝试将所有数据帧堆叠成一个(每十年一次),但没有成功。

> y1560s<-as.data.frame(mydata.split[6])

> for(i in 1:ncol(y1560s)){
    +   cross.table<-table(y1560s[, i], y1560s[, 6])
    +   data.list<-append(cross.table, data.list)
    +   big.table<-do.call(cbind, data.list)
    + }

输出非常奇怪:

> head(big.table)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
     [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27]
     [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38] [,39] [,40]
     [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] [,50] [,51] [,52] [,53]
     [,54] [,55] [,56] [,57] [,58] [,59] [,60] [,61] [,62] [,63] [,64] [,65] [,66]
     [,67] [,68] [,69] [,70] [,71] [,72] [,73] [,74] [,75] [,76] [,77] [,78] [,79]
     [,80] [,81] [,82] [,83] [,84] [,85] [,86] [,87] [,88] [,89] [,90] [,91] [,92]

预期的是如下所示的大表,但当时采用的流程相当麻烦。

> article<-table(y1560s[, 1], y1560s[, 6])
> compsynt<-table(y1560s[, 2], y1560s[, 6])
> position<-table(y1560s[, 3], y1560s[, 6])
> comptype<-table(y1560s[, 4], y1560s[, 6])
> substype<-table(y1560s[, 5], y1560s[, 6])
> genre<-table(y1560s[, 7], y1560s[, 6])
> big.table<-rbind(article, compsynt, position, comptype, substype, genre)
> big.table
          Lieu Place
No          14     4
Yes          0     3
AdjP         0     0
DetPoss      0     4
Gap          0     0
NFClau       6     0
NP           6     2
Num          0     0
ProForm      2     1
PronPers     0     0
PronPoss     0     0
SubClau      0     0
Fin          7     5
Init         2     2
Med          5     0
Abstr        3     1
Act          0     0
Anim         0     0
Conc         2     0
Hum          1     6
Plant        2     0
Prop         0     0
SoA          6     0
Comp         0     0
Conc         0     0
Contr        6     1
Dep          0     2
Emp          0     1
Repl         4     2
Subst        4     1
Corresp      0     0
Drama        1     1
Memoirs      0     0
Non-Litt     0     0
Novel        0     1
Other        0     0
Pamphlet     0     0
Poetry       0     1
Rhetoric     0     0
Travel       0     0
Treat_Ess   13     4
Undef        0     0

是否有更简单的方法来1)访问存储在几十年列表中的数据帧,以及2)将所有表格与变量“VARIANT”作为常量对应几十年列表的每个元素?

我提前非常感谢你的建议和伎俩。

CBechet。

编辑:尝试以下看起来很有希望,我只是无法将结果数据帧存储到列表中以便进一步处理。

> for(i in 1:length(mydata.split)) {
+   mytable<-as.data.frame(mydata.split[i])
+   article<-table(mytable[, 2], mytable[, 1])
+   compsynt<-table(mytable[, 3], mytable[, 1])
+   position<-table(mytable[, 4], mytable[, 1])
+   comptype<-table(mytable[, 5], mytable[, 1])
+   substype<-table(mytable[, 6], mytable[, 1])
+   genre<-table(mytable[, 7], mytable[, 1])
+   big.table<-as.data.frame(rbind(article, compsynt, position, comptype, substype, genre))
}

2 个答案:

答案 0 :(得分:1)

根据您的描述和您在编辑中发布的代码,但没有可重复的数据样本,我建议采用以下方法。

这应该达到您期望的输出,这是一个包含所有crosstab结果的非常大的表格。

do.call(what = rbind, args = 
          sapply(mydata.split, function(a_decade){
            my_table <- as.data.frame(a_decade)
            lapply(1:7, function(a_column){
              table(my_table[, a_column], my_table[, 1])
            })
          })
)

答案 1 :(得分:1)

只需将有希望的lapply循环代码转换为bigTablesList <- lapply(mydata.split, function(mytable) { article <- table(mytable[, 2], mytable[, 1]) compsynt <- table(mytable[, 3], mytable[, 1]) position <- table(mytable[, 4], mytable[, 1]) comptype <- table(mytable[, 5], mytable[, 1]) substype <- table(mytable[, 6], mytable[, 1]) genre <- table(mytable[, 7], mytable[, 1]) as.data.frame(rbind(article, compsynt, position, comptype, substype, genre)) }) ,即可将 bigTables 等长的列表转换为 mydata.split 列表:

import java.util.Scanner;

public class TestScanner {

    public static void main(String[] args) {
        System.out.println("--------------------------");
        System.out.println("        MENU 2              ");
        System.out.println("--------------------------");
        System.out.println("");
        System.out.println("PRESS 2 - Finalize sale");
        System.out.println("PRESS 3 - Cancel sale");
        System.out.println("--------------------------");

        Scanner scan = new Scanner(System.in);
        int userInput = scan.nextInt();
        System.out.print(userInput);
    }
}