我有一个DataFrame,其中一列包含每行的标签(除了每行的一些相关数据)。我有一个字典,其键等于可能的标签和值,等于与该标签相关的2元组信息。我想在我的框架上添加两个新列,每个列对应于每行的标签。
以下是设置:
import pandas as pd
import numpy as np
np.random.seed(1)
n = 10
labels = list('abcdef')
colors = ['red', 'green', 'blue']
sizes = ['small', 'medium', 'large']
labeldict = {c: (np.random.choice(colors), np.random.choice(sizes)) for c in labels}
df = pd.DataFrame({'label': np.random.choice(labels, n),
'somedata': np.random.randn(n)})
我可以通过运行得到我想要的东西:
df['color'], df['size'] = zip(*df['label'].map(labeldict))
print df
label somedata color size
0 b 0.196643 red medium
1 c -1.545214 green small
2 a -0.088104 green small
3 c 0.852239 green small
4 b 0.677234 red medium
5 c -0.106878 green small
6 a 0.725274 green small
7 d 0.934889 red medium
8 a 1.118297 green small
9 c 0.055613 green small
但是,如果我不想手动输入作业左侧的两列,我该怎么办呢?即如何动态创建多个新列。例如,如果我在labeldict
而不是2元组中有10个元组,那么这将是一个真正的痛苦,正如目前所写。以下是一些不起作用的事情:
# set up attrlist for later use
attrlist = ['color', 'size']
# non-working idea 1)
df[attrlist] = zip(*df['label'].map(labeldict))
# non-working idea 2)
df.loc[:, attrlist] = zip(*df['label'].map(labeldict))
这确实有效,但看起来像是黑客:
for a in attrlist:
df[a] = 0
df[attrlist] = zip(*df['label'].map(labeldict))
更好的解决方案?
答案 0 :(得分:8)
您可以改为使用合并:
>>> ld = pd.DataFrame(labeldict).T
>>> ld.columns = ['color', 'size']
>>> ld.index.name = 'label'
>>> df.merge(ld.reset_index(), on='label')
label somedata color size
0 b 1.462108 red medium
1 c -2.060141 green small
2 c 1.133769 green small
3 c 0.042214 green small
4 e -0.322417 red medium
5 e -1.099891 red medium
6 e -0.877858 red medium
7 e 0.582815 red medium
8 f -0.384054 red large
9 d -0.172428 red medium
答案 1 :(得分:7)
您可以将该信息转换为DataFrame,然后将其与原始信息相结合,而不是使用labeldict执行操作:
>>> labeldf = pandas.DataFrame([(np.random.choice(colors), np.random.choice(sizes)) for c in labels], columns=['color', 'size'], index=labels)
>>> df.join(labeldf, on='label')
label somedata color size
0 a -1.709973 red medium
1 b 0.099109 blue medium
2 a -0.427323 red medium
3 b 0.474995 blue medium
4 b -2.819208 blue medium
5 d -0.998888 red small
6 b 0.713357 blue medium
7 d 0.331989 red small
8 e -0.906240 green large
9 c -0.501916 blue large
答案 2 :(得分:0)
如果要在方法链中向// add action to check for table item
add_action('gdlr_print_item_selector', 'gdlr_league_table', 10, 2);
function gdlr_league_table($type, $settings = array()){
if($type == 'gol-krali'){
gdlr_print_league($settings);
}
}
//table item
function gdlr_print_league($settings){
// query league table
$args['post_type'] = 'player';
$args['posts_per_page'] = (empty($settings['num-fetch']))? '555': $settings['num-fetch'];
$query = new WP_Query( $args );
// getting table array
while($query->have_posts()){ $query->the_post();
$player_val = gdlr_lms_decode_preventslashes(get_post_meta(get_the_ID(), 'gdlr-soccer-player-settings', true));
$assists = empty($player_val)? array(): json_decode($player_val, true);
$table[get_the_title()]['p'] = ($assists);
}
echo '<div class="gdlr-item gdlr-league-table-item" ' . $item_id . $margin_style . ' >';
if(empty($settings['style']) || $settings['style'] == 'full'){
gdlr_print_league_table($table);
}
echo '</div>';
}
// table
function gdlr_print_league_table($player){
echo '<table class="gdlr-league-table" >';
?>
<tr class="gdlr-table-second-head gdlr-title-font">
<th class="gdlr-table-pos"><?php echo __('Sıra', 'gdlr-soccer'); ?></th>
<th class="gdlr-table-team"><?php echo __('Oyuncu', 'gdlr-soccer'); ?></th>
<th class="gdlr-table-p">Asist</th>
</tr>
<?php
$count = 1;
foreach($player as $player_name => $score ){
?>
<tr>
<td class="gdlr-table-pos"><?php echo $count; ?></td>
<td class="gdlr-table-team"><?php echo $player_name ?></td>
<td class="gdlr-table-p"><?php echo $score['p']['player-stats']['assists']; ?></td>
</tr>
<?php
$count++;
}
echo '</table>';
}
添加多个列,可以使用DataFrame
。第一步是创建一个函数,该函数会将以apply
表示的行转换为所需的形式。然后,您可以调用Series
在每一行上使用此功能。
apply
答案 3 :(得分:0)
只需在pandas apply中使用result_type='expand'
df
Out[78]:
a b
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
df[['mean', 'std', 'max']]=df[['a','b']].apply(mathOperationsTuple, axis=1, result_type='expand')
df
Out[80]:
a b mean std max
0 0 1 0.5 0.5 1.0
1 2 3 2.5 0.5 3.0
2 4 5 4.5 0.5 5.0
3 6 7 6.5 0.5 7.0
4 8 9 8.5 0.5 9.0
还有一些复制粘贴代码
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(10).reshape(5,2), columns=['a','b'])
print('df',df, sep='\n')
print()
def mathOperationsTuple(arr):
return np.mean(arr), np.std(arr), np.amax(arr)
df[['mean', 'std', 'max']]=df[['a','b']].apply(mathOperationsTuple, axis=1, result_type='expand')
print('df',df, sep='\n')