什么时候使用.loc和什么时候不使用(Pandas Dataframe)?

时间:2020-09-10 08:36:12

标签: python arrays python-3.x pandas dataframe

有人可以帮我理解吗,在设置数据框时,我们通常会这样做

df.loc[df['col_1']==0]

但是当我不得不基于两个列值或多个条件进行子集化时,我们改为这样做

df[(df['col_1']==0) & (df['col_2']>0)]

为什么第二个条件中不使用 .loc

此外,为什么我们不能在第二个代码(即

)中使用
df[(df['col_1']==0) and (df['col_2']>0)] ?

1 个答案:

答案 0 :(得分:1)

为什么在第二个条件中不使用.loc?

<?php

use yii\helpers\Html;
use yii\grid\GridView;

/* @var $this yii\web\View */
/* @var $searchModel app\models\EquipmentSearch */
/* @var $dataProvider yii\data\ActiveDataProvider */

$this->title = 'Equipments';
$this->params['breadcrumbs'][] = $this->title;
?>
<div class="equipment-index">

    <h1><?= Html::encode($this->title) ?></h1>

    <p>
        <?= Html::a('Create Equipment', ['create'], ['class' => 'btn btn-success']) ?>
    </p>

    <?php // echo $this->render('_search', ['model' => $searchModel]); ?>

    <?= GridView::widget([
        'dataProvider' => $dataProvider,
        'filterModel' => $searchModel,
        'columns' => [
            ['class' => 'yii\grid\SerialColumn'],
            'username',
            #'user_equip_id',
            'phone_model',
            'phone_series',
            'phone_date_acq',
            'nb_sg_model',
            'nb_sg_series',
            'nb_sg_date_acq',
            'display_model',
            'display_series',
            'display_date_acq',
            //'user_for_id',
            ['class' => 'yii\grid\ActionColumn'],
        ],
    ]); ?>


</div>

不,您错了,两者都可以使用。

df = pd.DataFrame({

         'col_1':[0,3,0,7,1,0],
         'col_2':[0,3,6,9,2,4],
         'col3':list('aaabbb')
})

print (df.loc[df['col_1']==0])
   col_1  col_2 col3
0      0      0    a
2      0      6    a
5      0      4    b

print (df.loc[(df['col_1']==0) & (df['col_2']>0)])
   col_1  col_2 col3
2      0      6    a
5      0      4    b

使用原因是是否还需要过滤列名称,例如print (df[df['col_1']==0]) col_1 col_2 col3 0 0 0 a 2 0 6 a 5 0 4 b print (df[(df['col_1']==0) & (df['col_2']>0)]) col_1 col_2 col3 2 0 6 a 5 0 4 b

col_1

如果需要过滤器2个或更多列,请使用列表,例如print (df.loc[df['col_1']==0, 'col_2']) 0 0 2 6 5 4 Name: col_2, dtype: int64 print (df.loc[(df['col_1']==0) & (df['col_2']>0), 'col_2']) 2 6 5 4 Name: col_2, dtype: int64 使用:

col_1,col3

如果省略print (df.loc[df['col_1']==0, ['col_1','col3']]) col_1 col3 0 0 a 2 0 a 5 0 b print (df.loc[(df['col_1']==0) & (df['col_2']>0), ['col_1','col3']]) col_1 col3 2 0 a 5 0 b ,则失败:

loc

TypeError


此外,为什么我们不能在第二个代码(即

)中使用
df[df['col_1']==0, 'col_1']
df[(df['col_1']==0) & (df['col_2']>0), 'col_1']

因为df[(df['col_1']==0) and (df['col_2']>0)] 用于标量处理,在熊猫中,and用于按位&-AND。更多信息是here