我的数据集是一个运动员数据集,其中一列是AthleteName
。有38个观察结果,但有些运动员不止一次参加,因此共有31位运动员。
我想提取一个包含31个观测值的“随机”样本,其中所有观测值只会出现一次,而出现更多时间的观测值将是随机选择的。
我尝试做这样的事情,但这给了我一个错误:
sample_fem = pd.DataFrame
total = 0
while total <= 31:
sample = female_dec.sample(n=1, replace=False)
sample = sample.reset_index()
if sample["AthleteName"][0] not in sample_fem["AthleteName"]:
sample_fem.append(sample)
total +=1
File "<ipython-input-561-249bb5b47652>", line 6, in <module>
if sample["AthleteName"][0] not in sample_fem["AthleteName"]:
TypeError: 'type' object is not subscriptable
答案 0 :(得分:0)
听起来您在“随机样本”中想要的是:
要做到这一点,首先我们建立一个数据框并指出一条记录是否出现了多次。
<?php
$taxonomy = 'typ';
if (is_page(29557) ):
$childno = '205';
elseif (is_page(29640) ):
$childno = '206';
endif;
$args = array(
'child_of'=> $childno,
'childless' => true
);
$taxonomy_terms = get_terms($taxonomy, $args);
if($taxonomy_terms) {
foreach($taxonomy_terms as $taxonomy_term) {
$taxquery = array('relation' => 'AND');
$metaquery = array('relation' => 'AND');
if(isset($_GET['obszar']) && $_GET['obszar'] != '')
{
$taxquery[] = array(
'taxonomy' => 'kategoria',
'field' => 'slug',
'terms' => $_GET["obszar"],
);
}
if(isset($_GET['firma']) && $_GET['firma'] != '')
{
$metaquery[] = array(
'compare' => '=',
'key' => 'terminy_warsztatow_$_firma',
'value' => $_GET["firma"],
);
}
if(isset($_GET['lokalizacja']) && $_GET['lokalizacja'] != '')
{
$metaquery[] = array(
'compare' => '=',
'key' => 'terminy_warsztatow_$_wojewodztwo',
'value' => $_GET["lokalizacja"],
);
}
if(isset($_GET['data']) && $_GET['data'] != '')
{
$metaquery[] = array(
'compare' => '=',
'key' => 'terminy_warsztatow_$_data',
'value' => $_GET["data"],
);
}
$args_main_query = array(
'post_type' => 'opencourses',
"$taxonomy" => $taxonomy_term->slug,
'post_status' => 'publish',
'posts_per_page' => -1,
'fields' => 'all',
'tax_query' => $taxquery,
'meta_query' => $metaquery,
);
$query = new WP_Query( $args_main_query );
if ( $query->have_posts() ) : ?>
<?php $color = get_field('blok_szkoleniowy_kolor', $taxonomy_term); ?>
<div class="term_anchor" id="<?php echo $taxonomy_term->slug; ?>"></div>
<h4 class="term_blok_szkoleniowy"><span class="spacer" style="background:<?php echo $color; ?>"></span><?php echo $taxonomy_term->name; ?></h4>
<?php while ( $query->have_posts() ) : $query->the_post(); ?>
<div class="row">
<div class="col-md-12">
<div class="coursWrapper">
<div class="visibleInfoCourses" style="border-left: 5px solid <?php echo $color; ?>">
<div class="row">
<div class="col-sm-7">
<p><a href="<?php the_permalink() ?>"><?php the_title() ?></a>
<?php if (get_field('nowosc')) : ?><span class="labelItem labeImNew">Nowość</span><?php endif ?>
<?php if (get_field('new_form')) : ?><span class="labelItem labeImNewForm">Nowa formuła</span><?php endif ?>
<?php if (get_field('new_date_info')) : ?><span class="labelItem labeImNewDate">Zmiana terminu</span><?php endif ?>
</p>
</div>
<div class="col-sm-5">
<div class="buttonsHereRight">
<button class="btnBorderGrey btn-lean-sm">Terminy warsztatów</button>
</div>
</div>
</div>
</div>
<div class="InvisibleInfoCourses" style="border-left: 5px solid <?php echo $color; ?>">
<?php if (have_rows('terminy_warsztatow')) : $i = 0; ?>
<div class="TerminySzkolen" style="padding: 0;">
<?php while (have_rows('terminy_warsztatow')) : the_row();
$i++;
$hotele = get_sub_field('polecane_hotele');
$zapisy = get_sub_field('stan_zapisow'); ?>
<div class="InvisibleInfoCoursesDetails" <?php if ($zapisy == 'Sprzedane') : echo " style='opacity:.6;'";endif; ?>>
<div class="row">
<div class="col-sm-10">
<span><img src="<?php the_sub_field('logotyp') ?>" alt="<?php the_sub_field('lokalizacja') ?>"></span>
<span><i style="opacity: .6;" class="glyphicon glyphicon-time"></i><?php the_sub_field('data') ?></span>
<span><i style="opacity: .6;" class="glyphicon glyphicon-map-marker"></i><?php the_sub_field('lokalizacja') ?></span>
<span><i style="opacity: .6;" class="glyphicon glyphicon glyphicon-list-alt"></i><?php echo $zapisy; ?></span>
</div>
<div class="col-sm-2">
<div class="buttonsHereRight">
<!-- <a class="btn-lean btn-lean-sm" target="_blank" download="Formularz zgłoszeniowy" href="<?php the_sub_field('przycisk') ?>" onclick="gtag('event', 'szkolenia', {'event_category' : 'szkolenie-otwarte-lista-szkoleń','event_label' : '<?php the_title() ?> '});" <?php if ($zapisy == 'Sprzedane') : echo " style='display:none'";endif; ?>>Zapisz się</a> -->
<!-- <a class="btn-lean btn-lean-sm" href="<?php the_permalink(); ?>" onclick="gtag('event', 'szkolenia', {'event_category' : 'szkolenie-otwarte-lista-szkoleń','event_label' : '<?php the_title() ?> '});" <?php if ($zapisy == 'Sprzedane') : echo " style='display:none'"; endif; ?>>Zapisz się</a> -->
<a class="btn-lean btn-lean-sm" href="<?php the_permalink(); ?>" <?php if ($zapisy == 'Sprzedane') : echo " style='display:none'"; endif; ?>>Zapisz się</a>
</div>
</div>
</div>
</div>
<?php endwhile; ?>
</div>
<?php endif ?>
</div>
</div>
</div>
</div>
<?php endwhile; ?>
<?php wp_reset_postdata(); // so nothin' weird happens to other loops
endif;
}
}
?>
接下来,基于我们创建的标志变量,我们将其分为“唯一性”和“双倍性”。
import pandas as pd
import numpy as np
df = pd.DataFrame({'a':[0,1,2,3,4,4,5,6,2]})
df['dup_flag'] = df.duplicated(keep=False)
df
a dup_flag
0 0 False
1 1 False
2 2 True
3 3 False
4 4 True
5 4 True
6 5 False
7 6 False
8 2 True
在我们的uniques = df.loc[df.dup_flag == False]
dups = df.loc[df.dup_flag == True]
数据帧上使用drop_duplicates
之前,只需为索引定义一个随机顺序即可。然后,我们可以合并结果。
dups