我正在使用一个名为pysal的包来运行以下Theil Decomposition,以在输出之内和之间找到它们。
当我在包装下面创建一个小的数据框时,它起作用。 参见下面的代码:
import pysal
path="/Users/username/Desktop/file1.csv"
df_table=pd.read_table(path, sep=",")
df2=pd.DataFrame(df_table)
df= df2.sort_values(['exposure'], ascending=True)
rr = np.array(df['exposure'])
drop = pysal.inequality.theil.Theil(rr)
print ('drop.T', drop.T) # this is total theil
dropp = pysal.inequality.theil.TheilD(rr, df['race'] )
print ('WG', dropp.wg) #within group
print ("BG", dropp.bg) #between group
当我尝试在更大的文件上运行相同的代码时,出现以下错误消息:
如何修复错误消息?
TypeError: unorderable types: float() < str()
下面是pysal包的源代码 两种文件类型的数据类型似乎相同。 我不确定为什么它会工作一个小文件,而不是一个大文件。
def __init__(self, y, partition):
groups = np.unique(partition)
T = Theil(y).T
ytot = y.sum(axis=0)
#group totals
gtot = np.array([y[partition == gid].sum(axis=0) for gid in groups])
mm = np.dot
if ytot.size == 1: # y is 1-d
sg = gtot / (ytot * 1.)
sg.shape = (sg.size, 1)
else:
sg = mm(gtot, np.diag(1. / ytot))
ng = np.array([sum(partition == gid) for gid in groups])
ng.shape = (ng.size,) # ensure ng is 1-d
n = y.shape[0]
# between group inequality
sg = sg + (sg==0) # handle case when a partition has 0 for sum
bg = np.multiply(sg, np.log(mm(np.diag(n * 1. / ng), sg))).sum(axis=0)
self.T = T
self.bg = bg
self.wg = T - bg