对文件夹中的每个可能的文件组合执行统计测试

时间:2015-12-30 03:31:49

标签: python pandas statistics

我有一个包含大约100个csv文件的文件夹。我想对每个可能的文件组合使用两个采样的Kolmogorov-Smirnov测试。我可以这样手动执行此操作:

import pandas as pd 
import scipy as sp

df=pd.read_csv(r'file1.csv')
df2=pd.read_csv(r'file2.csv')
sp.stats.ks_2samp(df, df2)

但我不想手动分配所有变量。有没有办法迭代文件并使用统计测试比较所有可能的组合?

1 个答案:

答案 0 :(得分:3)

听起来你想要获得自己的文件名列表的笛卡尔积。

Cartesian product of lists in python

在您的实现中,您应该拥有列表中所有文件名的列表,然后调用

itertools.product(files, files)

itertools.product ((x,y) for x in A for y in B) 中,它提到它与

相同
    private void pickRandom()
    {
        string somestr = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.";
        string[] newinp = somestr.Split(' ');
        Random rnd = new Random();
        int strtindex = rnd.Next(0, newinp.Length - 5);
        string fivewordString = String.Join(" ", newinp.Skip(strtindex).Take(5).ToArray());
    }