嗨,我正在尝试从多索引变量“ df”获取Crosstab:
df.tail()
code X1 X2 X3
pays USA USA USA
desc phase phase phase
2020-01-01 a a a
2020-02-01 b c d
2020-03-01 a a b
2020-04-01 c a a
2020-05-01 d a d
我想得到类似的东西:
X1 X2 X3
a b c d a b c d a b c d
a
X1 b
c
d
a
X2 b
c
d
a
X3 b
c
d
在每个单元格中我得到(a,b,c,d)的Xi数/ Xj值的百分比/
我尝试过:
pd.crosstab(index = df, columns = df)
但是我收到一条错误消息:
ValueError: Shape of passed values is (3, 2), indices imply (605, 2)
感谢您的帮助
答案 0 :(得分:0)
我没有找到使用pd.crosstab
函数执行此操作的方法,但是可以通过双循环实现。我很乐意将此功能与有序(分类)类型一起使用,但是我的幼稚尝试(注释掉了)没有用。
import pandas as pd
import numpy as np
def full_crosstab(df, row_keys=None, col_keys=None):
row_keys = row_keys or df.columns
col_keys = col_keys or df.columns
df_final = []
for outer in row_keys:
df_outer = []
for inner in col_keys:
df_inner = pd.crosstab(df[outer], df[inner])
df_outer.append(df_inner)
df_outer = pd.concat(df_outer, axis=1, keys=col_keys)
df_final.append(df_outer)
return pd.concat(df_final, keys=row_keys)
def category(values, size):
series = np.random.choice(values, size=size)
return pd.Series(series)
#dtype = pd.CategoricalDtype(categories=values, ordered=True)
#return pd.Series(series, dtype=dtype)
size = 100
mydf = pd.DataFrame(dict(
age_range=category(['<18', '18-34', '35-64', '65+'], size=size),
reg=category(['yes', 'no'], size=size),
issue=category(['guns', 'schools', 'healthcare'], size=size),
))
df_ct = full_crosstab(mydf)
print(df_ct)
age_range reg issue
18-34 35-64 65+ <18 no yes guns healthcare schools
age_range 18-34 22 0 0 0 14 8 8 7 7
35-64 0 24 0 0 10 14 11 10 3
65+ 0 0 23 0 13 10 9 5 9
<18 0 0 0 31 17 14 5 14 12
reg no 14 10 13 17 54 0 13 19 22
yes 8 14 10 14 0 46 20 17 9
issue guns 8 11 9 5 13 20 33 0 0
healthcare 7 10 5 14 19 17 0 36 0
schools 7 3 9 12 22 9 0 0 31