熊猫:同时分配多个* new *列

时间:2013-12-29 20:30:25

标签: python pandas

我有一个DataFrame,其中一列包含每行的标签(除了每行的一些相关数据)。我有一个字典,其键等于可能的标签和值,等于与该标签相关的2元组信息。我想在我的框架上添加两个新列,每个列对应于每行的标签。

以下是设置:

import pandas as pd
import numpy as np

np.random.seed(1)
n = 10

labels = list('abcdef')
colors = ['red', 'green', 'blue']
sizes = ['small', 'medium', 'large']

labeldict = {c: (np.random.choice(colors), np.random.choice(sizes)) for c in labels}

df = pd.DataFrame({'label': np.random.choice(labels, n), 
                   'somedata': np.random.randn(n)})

我可以通过运行得到我想要的东西:

df['color'], df['size'] = zip(*df['label'].map(labeldict))
print df

  label  somedata  color    size
0     b  0.196643    red  medium
1     c -1.545214  green   small
2     a -0.088104  green   small
3     c  0.852239  green   small
4     b  0.677234    red  medium
5     c -0.106878  green   small
6     a  0.725274  green   small
7     d  0.934889    red  medium
8     a  1.118297  green   small
9     c  0.055613  green   small

但是,如果我不想手动输入作业左侧的两列,我该怎么办呢?即如何动态创建多个新列。例如,如果我在labeldict而不是2元组中有10个元组,那么这将是一个真正的痛苦,正如目前所写。以下是一些不起作用的事情:

# set up attrlist for later use
attrlist = ['color', 'size']

# non-working idea 1)
df[attrlist] = zip(*df['label'].map(labeldict))

# non-working idea 2)
df.loc[:, attrlist] = zip(*df['label'].map(labeldict))

这确实有效,但看起来像是黑客:

for a in attrlist:
    df[a] = 0
df[attrlist] = zip(*df['label'].map(labeldict))

更好的解决方案?

4 个答案:

答案 0 :(得分:8)

您可以改为使用合并:

>>> ld = pd.DataFrame(labeldict).T
>>> ld.columns = ['color', 'size']
>>> ld.index.name = 'label'
>>> df.merge(ld.reset_index(), on='label')
  label  somedata  color    size
0     b  1.462108    red  medium
1     c -2.060141  green   small
2     c  1.133769  green   small
3     c  0.042214  green   small
4     e -0.322417    red  medium
5     e -1.099891    red  medium
6     e -0.877858    red  medium
7     e  0.582815    red  medium
8     f -0.384054    red   large
9     d -0.172428    red  medium

答案 1 :(得分:7)

您可以将该信息转换为DataFrame,然后将其与原始信息相结合,而不是使用labeldict执行操作:

>>> labeldf = pandas.DataFrame([(np.random.choice(colors), np.random.choice(sizes)) for c in labels], columns=['color', 'size'], index=labels)
>>> df.join(labeldf, on='label')
  label  somedata  color    size
0     a -1.709973    red  medium
1     b  0.099109   blue  medium
2     a -0.427323    red  medium
3     b  0.474995   blue  medium
4     b -2.819208   blue  medium
5     d -0.998888    red   small
6     b  0.713357   blue  medium
7     d  0.331989    red   small
8     e -0.906240  green   large
9     c -0.501916   blue   large

答案 2 :(得分:0)

如果要在方法链中向// add action to check for table item add_action('gdlr_print_item_selector', 'gdlr_league_table', 10, 2); function gdlr_league_table($type, $settings = array()){ if($type == 'gol-krali'){ gdlr_print_league($settings); } } //table item function gdlr_print_league($settings){ // query league table $args['post_type'] = 'player'; $args['posts_per_page'] = (empty($settings['num-fetch']))? '555': $settings['num-fetch']; $query = new WP_Query( $args ); // getting table array while($query->have_posts()){ $query->the_post(); $player_val = gdlr_lms_decode_preventslashes(get_post_meta(get_the_ID(), 'gdlr-soccer-player-settings', true)); $assists = empty($player_val)? array(): json_decode($player_val, true); $table[get_the_title()]['p'] = ($assists); } echo '<div class="gdlr-item gdlr-league-table-item" ' . $item_id . $margin_style . ' >'; if(empty($settings['style']) || $settings['style'] == 'full'){ gdlr_print_league_table($table); } echo '</div>'; } // table function gdlr_print_league_table($player){ echo '<table class="gdlr-league-table" >'; ?> <tr class="gdlr-table-second-head gdlr-title-font"> <th class="gdlr-table-pos"><?php echo __('Sıra', 'gdlr-soccer'); ?></th> <th class="gdlr-table-team"><?php echo __('Oyuncu', 'gdlr-soccer'); ?></th> <th class="gdlr-table-p">Asist</th> </tr> <?php $count = 1; foreach($player as $player_name => $score ){ ?> <tr> <td class="gdlr-table-pos"><?php echo $count; ?></td> <td class="gdlr-table-team"><?php echo $player_name ?></td> <td class="gdlr-table-p"><?php echo $score['p']['player-stats']['assists']; ?></td> </tr> <?php $count++; } echo '</table>'; } 添加多个列,可以使用DataFrame。第一步是创建一个函数,该函数会将以apply表示的行转换为所需的形式。然后,您可以调用Series在每一行上使用此功能。

apply

答案 3 :(得分:0)

只需在pandas apply中使用result_type='expand'

df
Out[78]: 
   a  b
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9

df[['mean', 'std', 'max']]=df[['a','b']].apply(mathOperationsTuple, axis=1, result_type='expand')

df
Out[80]: 
   a  b  mean  std  max
0  0  1   0.5  0.5  1.0
1  2  3   2.5  0.5  3.0
2  4  5   4.5  0.5  5.0
3  6  7   6.5  0.5  7.0
4  8  9   8.5  0.5  9.0

还有一些复制粘贴代码

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(10).reshape(5,2), columns=['a','b'])
print('df',df, sep='\n')
print()
def mathOperationsTuple(arr):
    return np.mean(arr), np.std(arr), np.amax(arr)

df[['mean', 'std', 'max']]=df[['a','b']].apply(mathOperationsTuple, axis=1, result_type='expand')
print('df',df, sep='\n')