是否有一种pythonic方式在熊猫中做一个列联表?

时间:2015-04-27 16:41:01

标签: python python-2.7 pandas dataframe

给定一个如下所示的数据框:

            A   B      
2005-09-06  5  -2  
2005-09-07 -1   3  
2005-09-08  4   5 
2005-09-09 -8   2
2005-09-10 -2  -5
2005-09-11 -7   9 
2005-09-12  2   8  
2005-09-13  6  -5  
2005-09-14  6  -5  

有没有像这样创建2x2矩阵的pythonic方法:

    1  0
 1  a  b
 0  c  d

其中:

a = ob的数量,其中A列和B列的相应元素均为正值。

b =柱的相应元素在B列中为正和负的obs数。

c =列A中相应元素为负且在B列中为正的障碍数。

d = A列和B列的相应元素均为负值的obs数。

对于此示例,输出将为:

    1  0
 1  2  3
 0  3  1

由于

4 个答案:

答案 0 :(得分:26)

可能最容易使用pandas函数crosstab。借用以上的Dyno Fu:

import pandas as pd
from StringIO import StringIO
table = """dt          A   B
2005-09-06  5  -2
2005-09-07 -1   3
2005-09-08  4   5
2005-09-09 -8   2
2005-09-10 -2  -5
2005-09-11 -7   9
2005-09-12  2   8
2005-09-13  6  -5
2005-09-14  6  -5
"""
sio = StringIO(table)
df = pd.read_table(sio, sep=r"\s+", parse_dates=['dt'])
df.set_index("dt", inplace=True)

pd.crosstab(df.A > 0, df.B > 0)

输出:

B      False  True 
A                  
False      1      3
True       3      2

[2 rows x 2 columns]

如果您想使用scipy.stats等进行Fisher精确测试,该表也可用:

from scipy.stats import fisher_exact
tab = pd.crosstab(df.A > 0, df.B > 0)
fisher_exact(tab)

答案 1 :(得分:16)

让我们调用您的数据框a = data['A']>0 b = data['B']>0 data.groupby([a,b]).count() 。尝试

<!DOCTYPE html>
<html lang="en">
<?php 
   $ty=$_GET['param'];
   $name=$_GET['param1'];
   if($ty=='teacher')
   {
      $web = "<a href='teacherrepute.php?a=$name'>My repute score</a>"; 
      $rep = "<a href='teacherreported.php?a=$name'>My reported sites</a>";
      $blk = "<a href='newblocktryteacher.php?a=$name'>Block this site</a>";
      $unblk = "<a href='newtryunblockteacher.php?a=$name>Unblock this site";
   }
   else
   {
      $web = "<a href='pupilrepute.php?a=$name'>My repute score</a>"; 
      $rep = "<a href='pupilreported.php?a=$name'>My reported sites</a>";
      $blk = "<a href='newblocktrypupil.php?a=$name'>Block this site</a>";
      $unblk = "<a href='newtryunblockpupil.php?a=$name>Unblock this site";
   }
   // $type=$_GET['param2'];
   $courseA='A';
   $courseB='B';
?>
<body>
   <a href="reporttable.html"><?php echo $rep; ?></a>
   <FORM action = <?php echo $blk; ?>  method ="POST";>
      Block : <input type ="text" name = "url" /></br>
      <br>
      <input type="submit" value="block" />   
      <br>
   </FORM>
</body>
</html>

答案 2 :(得分:6)

这是关于pandas交叉表功能的一个非常有用的页面:

http://chrisalbon.com/python/pandas_crosstabs.html

所以我想你应该做什么,你应该使用

import pandas as pd
pd.crosstab(data['A']>0, data['B']>0)

希望有所帮助!

答案 3 :(得分:4)

import pandas as pd
from StringIO import StringIO

table = """dt          A   B
2005-09-06  5  -2
2005-09-07 -1   3
2005-09-08  4   5
2005-09-09 -8   2
2005-09-10 -2  -5
2005-09-11 -7   9
2005-09-12  2   8
2005-09-13  6  -5
2005-09-14  6  -5
"""
sio = StringIO(table)
df = pd.read_table(sio, sep=r"\s+", parse_dates=['dt'])
df.set_index("dt", inplace=True)

a = df['A'] > 0
b = df['B'] > 0
df1 = df.groupby([a,b]).count()
print df1["A"].unstack()

输出:

B      False  True
A
False      1      3
True       3      2

这只是lnanenok的回答并使用unstack()使其更具可读性。应该归功于lanenok。