从熊猫数据框中获取随机样本,但每个值仅获取一个

时间:2020-01-01 19:43:04

标签: python pandas

我的数据集是一个运动员数据集,其中一列是AthleteName。有38个观察结果,但有些运动员不止一次参加,因此共有31位运动员。 我想提取一个包含31个观测值的“随机”样本,其中所有观测值只会出现一次,而出现更多时间的观测值将是随机选择的。

我尝试做这样的事情,但这给了我一个错误:

sample_fem = pd.DataFrame
total = 0
while total <= 31:
    sample = female_dec.sample(n=1, replace=False)
    sample = sample.reset_index()
    if sample["AthleteName"][0] not in sample_fem["AthleteName"]:
       sample_fem.append(sample)
       total +=1 


  File "<ipython-input-561-249bb5b47652>", line 6, in <module>
    if sample["AthleteName"][0] not in sample_fem["AthleteName"]:

TypeError: 'type' object is not subscriptable

1 个答案:

答案 0 :(得分:0)

听起来您在“随机样本”中想要的是:

  • 所有仅在数据中出现一次的运动员记录
  • 每位运动员的单条记录在数据中出现两次或两次以上,是随机选择的

要做到这一点,首先我们建立一个数据框并指出一条记录是否出现了多次。

<?php

    $taxonomy = 'typ';

        if (is_page(29557) ):
            $childno = '205';

        elseif (is_page(29640) ):
            $childno = '206';

        endif;                                      

    $args = array(
        'child_of'=> $childno,
        'childless' => true                     
    );

    $taxonomy_terms = get_terms($taxonomy, $args);


    if($taxonomy_terms) {
        foreach($taxonomy_terms as $taxonomy_term) {

            $taxquery = array('relation' => 'AND');
            $metaquery = array('relation' => 'AND');

            if(isset($_GET['obszar']) && $_GET['obszar'] != '')
            {
                $taxquery[] =  array(
                    'taxonomy' => 'kategoria',
                    'field'    => 'slug',
                    'terms'    => $_GET["obszar"],
                    );
            }
            if(isset($_GET['firma']) && $_GET['firma'] != '')
            {
                $metaquery[] =  array(
                    'compare' => '=',
                    'key'    => 'terminy_warsztatow_$_firma',
                    'value'    => $_GET["firma"],
                );
            }
            if(isset($_GET['lokalizacja']) && $_GET['lokalizacja'] != '')
            {
                $metaquery[] =  array(
                    'compare' => '=',
                    'key'    => 'terminy_warsztatow_$_wojewodztwo',
                    'value'    => $_GET["lokalizacja"],
                );
            }
            if(isset($_GET['data']) && $_GET['data'] != '')
            {
                $metaquery[] =  array(
                    'compare' => '=',
                    'key'    => 'terminy_warsztatow_$_data',
                    'value'    => $_GET["data"],
                );
            }

            $args_main_query = array(
                'post_type' => 'opencourses',
                "$taxonomy" => $taxonomy_term->slug,
                'post_status' => 'publish',
                'posts_per_page' => -1, 
                'fields' => 'all',
                'tax_query' => $taxquery,
                'meta_query' => $metaquery,
            );

            $query = new WP_Query( $args_main_query );

            if ( $query->have_posts() ) : ?>

            <?php $color = get_field('blok_szkoleniowy_kolor', $taxonomy_term); ?>

            <div class="term_anchor" id="<?php echo $taxonomy_term->slug; ?>"></div>
            <h4 class="term_blok_szkoleniowy"><span class="spacer" style="background:<?php echo $color; ?>"></span><?php echo $taxonomy_term->name; ?></h4>

                <?php while ( $query->have_posts() ) : $query->the_post(); ?>

                <div class="row">
                    <div class="col-md-12">
                        <div class="coursWrapper">

                            <div class="visibleInfoCourses" style="border-left: 5px solid <?php echo $color; ?>">
                                    <div class="row">
                                        <div class="col-sm-7">
                                            <p><a href="<?php the_permalink() ?>"><?php the_title() ?></a>
                                            <?php if (get_field('nowosc')) : ?><span class="labelItem labeImNew">Nowość</span><?php endif ?>
                                            <?php if (get_field('new_form')) : ?><span class="labelItem labeImNewForm">Nowa formuła</span><?php endif ?>
                                            <?php if (get_field('new_date_info')) : ?><span class="labelItem labeImNewDate">Zmiana terminu</span><?php endif ?>
                                            </p>
                                        </div>
                                        <div class="col-sm-5">
                                            <div class="buttonsHereRight">
                                                <button class="btnBorderGrey btn-lean-sm">Terminy warsztatów</button>
                                            </div>
                                        </div>
                                    </div>                                                                                                      
                            </div>

                            <div class="InvisibleInfoCourses" style="border-left: 5px solid <?php echo $color; ?>">
                                <?php if (have_rows('terminy_warsztatow')) : $i = 0; ?>
                                    <div class="TerminySzkolen" style="padding: 0;">
                                        <?php while (have_rows('terminy_warsztatow')) : the_row();
                                            $i++;
                                            $hotele = get_sub_field('polecane_hotele');
                                            $zapisy = get_sub_field('stan_zapisow'); ?>
                                            <div class="InvisibleInfoCoursesDetails" <?php if ($zapisy == 'Sprzedane') : echo " style='opacity:.6;'";endif; ?>>
                                                <div class="row">
                                                    <div class="col-sm-10">
                                                        <span><img src="<?php the_sub_field('logotyp') ?>" alt="<?php the_sub_field('lokalizacja') ?>"></span>
                                                        <span><i style="opacity: .6;" class="glyphicon glyphicon-time"></i><?php the_sub_field('data') ?></span>
                                                        <span><i style="opacity: .6;" class="glyphicon glyphicon-map-marker"></i><?php the_sub_field('lokalizacja') ?></span>
                                                        <span><i style="opacity: .6;" class="glyphicon glyphicon glyphicon-list-alt"></i><?php echo $zapisy; ?></span>
                                                    </div>
                                                    <div class="col-sm-2">
                                                        <div class="buttonsHereRight">
                                                            <!-- <a class="btn-lean btn-lean-sm" target="_blank" download="Formularz zgłoszeniowy" href="<?php the_sub_field('przycisk') ?>" onclick="gtag('event', 'szkolenia', {'event_category' : 'szkolenie-otwarte-lista-szkoleń','event_label' : '<?php the_title() ?> '});" <?php if ($zapisy == 'Sprzedane') : echo " style='display:none'";endif; ?>>Zapisz się</a> -->
                                                            <!-- <a class="btn-lean btn-lean-sm" href="<?php the_permalink(); ?>" onclick="gtag('event', 'szkolenia', {'event_category' : 'szkolenie-otwarte-lista-szkoleń','event_label' : '<?php the_title() ?> '});" <?php if ($zapisy == 'Sprzedane') : echo " style='display:none'"; endif; ?>>Zapisz się</a> -->
                                                            <a class="btn-lean btn-lean-sm" href="<?php the_permalink(); ?>" <?php if ($zapisy == 'Sprzedane') : echo " style='display:none'"; endif; ?>>Zapisz się</a>
                                                        </div>
                                                    </div>
                                                </div>
                                            </div>
                                        <?php endwhile; ?>
                                    </div>
                                <?php endif ?>
                            </div>
                        </div>
                    </div>
                </div>

                <?php endwhile; ?>



            <?php wp_reset_postdata(); // so nothin' weird happens to other loops
            endif;
        }
    }

?>

接下来,基于我们创建的标志变量,我们将其分为“唯一性”和“双倍性”。

import pandas as pd
import numpy as np


df = pd.DataFrame({'a':[0,1,2,3,4,4,5,6,2]})
df['dup_flag'] = df.duplicated(keep=False)
df
    a   dup_flag
0   0   False
1   1   False
2   2   True
3   3   False
4   4   True
5   4   True
6   5   False
7   6   False
8   2   True

在我们的uniques = df.loc[df.dup_flag == False] dups = df.loc[df.dup_flag == True] 数据帧上使用drop_duplicates之前,只需为索引定义一个随机顺序即可。然后,我们可以合并结果。

dups