我有一个类似的问题:Crosstab with multiple items,但我没有尝试在R中做到这一点,我正在尝试使用Crosstab在Python Pandas中做到这一点。
我一直在尝试使用Python Pandas交叉表功能制作人口统计表,但是一次只能进行一次人口统计。换句话说,我想创建一个交叉表,使所有行变量处于同一级别。也许这不是交叉表的功能,而Pandas数据透视表之类的功能会更好呢?
当前,我使用以下三行代码,但会认为有某种方式可以将它们组合在一起:
genderTable = pd.crosstab(refQtrData['GENDER'], [refQtrData['FUNDINGSOURCE'],refQtrData['PROVIDER'],refQtrData['LOCATION']], margins='true')
raceTable = pd.crosstab(refQtrData['RACETH4'], [refQtrData['FUNDINGSOURCE'],refQtrData['PROVIDER'],refQtrData['LOCATION']], margins='true')
ageTable = pd.crosstab(refQtrData['REFERRED'], [refQtrData['FUNDINGSOURCE'],refQtrData['PROVIDER'],refQtrData['LOCATION']], values=refQtrData['AGEREF'], aggfunc='mean')
我想做什么: Demographic Table
这最初是使用以下代码在SPSS中完成的,但我正在尝试将其移至python。就像SPSS CTABLES允许我具有多个类别和变量一样,我希望有多个行对应于不同的变量,而不必位于不同的级别。
CTABLES
/VLABELS VARIABLES= GENDER RACE AGE FUNDINGSOURCE PROVIDER LOCATION
DISPLAY=LABEL
/TABLE REFERRED [C][COUNT F40.0] + GENDER [C][COUNT F40.0, COLPCT.COUNT PCTPAREN40.0] + RACE
[C][COUNT F40.0, COLPCT.COUNT PCTPAREN40.0] + AGE [S][MEAN] + AGE [S][MINIMUM, MAXIMUM]
BY FUNDINGSOURCE [C] > PROVIDER [C] > LOCATION [C]
/SLABELS VISIBLE=NO
/CATEGORIES VARIABLES=GENDER RACE ORDER=A KEY=VALUE MISSING=INCLUDE EMPTY=INCLUDE
/CATEGORIES VARIABLES=FUNDINGSOURCE ORDER=A KEY=VALUE MISSING=INCLUDE EMPTY=EXCLUDE
/CATEGORIES VARIABLES=PROVIDER [1, 2] EMPTY=EXCLUDE
/CATEGORIES VARIABLES=LOCATION [1, 2] EMPTY=EXCLUDE.
答案 0 :(得分:0)
在没有可复制的示例的情况下,我们可以依靠pandas交叉表文档,该文档在下面具有一些复制/粘贴的示例交叉表。
import pandas as pd
import numpy as np
a = np.array(["foo", "foo", "foo", "foo", "bar", "bar","bar", "bar", "foo", "foo", "foo"], dtype=object)
b = np.array(["one", "one", "one", "two", "one", "one", "one", "two", "two", "two", "one"], dtype=object)
c = np.array(["dull", "dull", "shiny", "dull", "dull", "shiny", "shiny", "dull", "shiny", "shiny", "shiny"],dtype=object)
d = np.array(["1foo", "1foo", "1foo", "1foo", "1bar", "1bar","1bar", "1bar", "1foo", "1foo", "1foo"], dtype=object)
这给出了四个数组。制作交叉表。这将返回DataFrames。
df1 = pd.crosstab(a, [b, c], rownames=['aa'], colnames=['b', 'c'])
df2 = pd.crosstab(d, [b, c], rownames=['aa'], colnames=['b', 'c'])
使用pandas.concat([],axis=...)
pd.concat([df1, df2], axis=0)
b one two
c dull shiny dull shiny
aa
bar 1 2 1 0
foo 2 2 1 2
1bar 1 2 1 0
1foo 2 2 1 2
>>> pd.concat([df1, df2], axis=1)
b one two one two
c dull shiny dull shiny dull shiny dull shiny
1bar NaN NaN NaN NaN 1.0 2.0 1.0 0.0
1foo NaN NaN NaN NaN 2.0 2.0 1.0 2.0
bar 1.0 2.0 1.0 0.0 NaN NaN NaN NaN
foo 2.0 2.0 1.0 2.0 NaN NaN NaN NaN
就通过一个函数调用创建三个交叉表而言,实现一个接受数据并返回串联的交叉表的函数。不确定是否可以采用合理的单线方式完成。
然后留一个以进一步修改或以其他方式加入DataFrame。