我已将这些编程用于计算方差
import pandas as pd
import xlrd
import numpy as np
import matplotlib.pyplot as plt
credit_card=pd.read_csv("default_of_credit_card_clients_Data.csv",skiprows=1)
print(credit_card.head())
for col in credit_card:
var[col]=np.var(credit_card(col))
print(var)
我收到此错误
Traceback(最近一次调用最后一次):文件“C:/Python34/project.py”, 第11行,在 var [col] = np.var(credit_card(col))TypeError:'DataFrame'对象不可调用
我们将不胜感激。
答案 0 :(得分:6)
您似乎需要DataFrame.var
:
默认情况下由N-1标准化。这可以使用ddof参数
进行更改
var1 = credit_card.var()
样品:
#random dataframe
np.random.seed(100)
credit_card = pd.DataFrame(np.random.randint(10, size=(5,5)), columns=list('ABCDE'))
print (credit_card)
A B C D E
0 8 8 3 7 7
1 0 4 2 5 2
2 2 2 1 0 8
3 4 0 9 6 2
4 4 1 5 3 4
var1 = credit_card.var()
print (var1)
A 8.8
B 10.0
C 10.0
D 7.7
E 7.8
dtype: float64
var2 = credit_card.var(axis=1)
print (var2)
0 4.3
1 3.8
2 9.8
3 12.2
4 2.3
dtype: float64
如果需要使用numpy.var
:
print (np.var(credit_card.values, axis=0))
[ 7.04 8. 8. 6.16 6.24]
print (np.var(credit_card.values, axis=1))
[ 3.44 3.04 7.84 9.76 1.84]
差异是因为ddof=1
中默认为pandas
,但您可以将其更改为0
:
var1 = credit_card.var(ddof=0)
print (var1)
A 7.04
B 8.00
C 8.00
D 6.16
E 6.24
dtype: float64
var2 = credit_card.var(ddof=0, axis=1)
print (var2)
0 3.44
1 3.04
2 7.84
3 9.76
4 1.84
dtype: float64