如何使用pandas从数据框中的特定变量中选择和计算值

时间:2018-02-17 17:46:27

标签: pandas dataframe selection calculation

我在代码下面运行并得到这个:

import pandas as pd
pf=pd.read_csv("https://www.dropbox.com/s/08kuxi50d0xqnfc/demo.csv?dl=1")
x=pf[pf['fuv1'] == 0].count()*100/1892
x
id          0.528541
date        0.528541
count       0.528541
idade       0.528541
site        0.528541
baseline    0.528541
fuv1        0.528541
fuv2        0.475687
fuv3        0.528541
fuv4        0.475687
dtype: float64

我想要的只是获得此结果 0.528541 并忘记了以上所有结果。

怎么办? 感谢。

3 个答案:

答案 0 :(得分:2)

In [282]: pf.loc[pf['fuv1'] == 0, 'id'].count()*100/1892
Out[282]: 0.5285412262156448

答案 1 :(得分:2)

如果要计算0列中fuv1值的计数,请sum使用True作为1 s等进程的计数{<1}}:

print ((pf['fuv1'] == 0).sum())
10

x = (pf['fuv1'] == 0).sum()*100/1892
print (x)
0.528541226216

解释为什么不同的输出 - count排除NaN s:

pf=pd.read_csv("https://www.dropbox.com/s/08kuxi50d0xqnfc/demo.csv?dl=1")
x=pf[pf['fuv1'] == 0]
print (x)
    id       date  count  idade site  baseline  fuv1  fuv2  fuv3  fuv4
0    0   4/1/2016     10     13    A         1   0.0   1.0   0.0   1.0
2    2   4/3/2016      9      5    C         1   0.0   NaN   0.0   1.0
3    3   4/4/2016    108     96    D         1   0.0   1.0   0.0   NaN
11  11  4/12/2016      6     13    C         1   0.0   1.0   1.0   0.0
13  13  4/14/2016     12      4    C         1   0.0   1.0   1.0   0.0
40  40  5/11/2016     14      7    C         1   0.0   1.0   1.0   1.0
41  41  5/12/2016      0     26    C         1   0.0   1.0   1.0   1.0
42  42  5/13/2016     10     15    C         1   0.0   1.0   1.0   1.0
60  60  5/31/2016     13      3    D         1   0.0   1.0   1.0   1.0
74  74  6/14/2016     15      7    B         1   0.0   1.0   1.0   1.0

print (x.count())
id          10
date        10
count       10
idade       10
site        10
baseline    10
fuv1        10
fuv2         9
fuv3        10
fuv4         9
dtype: int64

答案 2 :(得分:0)

import pandas as pd

pf=pd.read_csv("https://www.dropbox.com/s/08kuxi50d0xqnfc/demo.csv?dl=1")

x = (pf['fuv1'] == 0).sum()*100/1892
y=pf["idade"].mean()

l = "Performance"
k = "LTFU"


def test(l1,k1):
    return pd.DataFrame({'a':[l1, k1], 'b':[x, y]})

df1 = test(l,k)
df1.columns = [''] * len(df1.columns)   
df1.index = [''] * len(df1.index)   

print(round(df1, 2))

  Performance   0.53
         LTFU  14.13