Question

1）如何使用python代码找到以下数据集的相关性？

T = [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
P = [ 3480. 7080. 10440. 13200. 16800. 20400. 23880. 27480. 30840. 38040. 41520. 44880. 48480. 52080. 55680. 59280. 62520. 66120. 67580. 69620. 69621.]

2）**输入是csv文件：

2，M，17748,60,60,21768,1460.0,7,2011-04-02 00：00：00,0，B，5,2011-07-22 03：03：00,52.0,1 ，1992,2011,2011,22,2,7,0,3,4,21768,1992-07-05 00：00：00,26,21768，W，50f38a469cf9c253d600000c，21768 1，M，18002,3,3 ，1746,3480.0,2,2011-04-07 00：00：00,0，B，5,2011-07-25 01：03：00,123.0,1,1985,2011,2011,25,7,7,0 ，1,4,1746,1985-02-05 00：00：00,3,1746，D，50f38a469cf9c253d600000d，1746 1，M，18003,3,3,2239,3600.0,1,2011-04-06 00： 00：00,0，B，29,2011-07-25 01：03：00,89.0,1,1972,2011,2011,25,6,7,0,1,4,2239,1972-01-29 00：00：00,3,2239，D，50f38a469cf9c253d600000e，2239 1，F，18004,3,3,1965,3360.0,1,2011-04-06 00：00：00,0，B，28,2011- 07-25 01：03：00,76.0,1,1955,2011,2011,25,6,7,0,1,4,1965,1955-01-28 00：00：00,3,1965，D， 50f38a469cf9c253d600000f，1965 **

我写道：

counts_W=defaultdict(int) 
counts_D=defaultdict(int) 
for row in reader: 
if(row[28]=='W'):
counts_W[row[5]] += 1
Amt_Wtotal += float(row[6]) 
dataW.append(Amt_Wtotal) 
else: 
counts_D[row[5]] += 1
Amt_Dtotal += float(row[6])
dataD.append(Amt_Dtotal) 
Withdraw_amount = array(counts_W.values())
Withdraw_frequency = array(dataW)
Deposit_amount = array(counts_D.values())
Deposit_frequency = array(dataD)

给出了输出：

Withdraw ==== defaultdict（，{'21768'：1}）[1460.0] count == 1 Deposit ===== defaultdict（，{'2239'：1，'1700'：1，'2458 '：1，'2056'：1，'2376'：1，'1965'：1，'1974'：1，'2425'：1，'21768'：1，'2069'：1，'2404'： 1，'2402'：1，'1763'：1，'1762'：1，'1910'：1，'1746'：1，'10036'：1，'1903'：1，'2445'：1， '1770'：1}）[3480.0,7080.0,10440.0,13200.0,16800.0,20400.0,23880.0,27480.0,30840.0,38040.0,41520.0,44880.0,48480.0,52080.0,55680.0,59280.0,62520.0,66120.0,67580.0,69620.0]计数= = 20

如何将偶数金额追加到字典中并访问它以查找相关性

3）我如何找到一年中每个月的频率和金额

Answer 1

要计算两个数据系列之间的相关性，我使用scipy.stats。我建议你调查这个包。

来自docs：

pearsonr(x, y) #Pearson correlation coefficient and the p-value for testing
spearmanr(a[, b, axis]) #Spearman rank-order correlation coefficient and the p-value
pointbiserialr(x, y) #Point biserial correlation coefficient and the associated p-value.
kendalltau(x, y[, initial_lexsort]) #Calculates Kendall’s tau, a correlation measure for ordinal data.

还有与频率相关的方法：

cumfreq(a[, numbins, defaultreallimits, weights])   #cumulative frequency histogram
relfreq(a[, numbins, defaultreallimits, weights])   #relative frequency histogram

Answer 2

T = np.array([1 ,1 ,1 ,1 ,1, 1, 1 ,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1, 1, 1])
P = np.array([ 3480, 7080, 10440, 13200, 16800, 20400, 23880, 27480, 30840, 38040, 
    41520, 44880, 48480, 52080, 55680, 59280, 62520, 66120, 67580, 69620, 69621])
print (T.shape)
print (P.shape)
t_p = np.stack((T,P))
print (np.corrcoef(t_p))

numpy相关系数

2 个答案: