将值设置为pandas数据框中的多行

时间:2017-05-17 09:37:15

标签: python pandas

我按

选择特定的行
merged.loc[newsletters['Datum & Uhrzeit'], 'newsletters']

我想在newsletters['Advertiser']

中将每一行设置为相应的值

由于某种原因,这不会修改合并。

merged.loc[newsletters['Datum & Uhrzeit'], 'newsletters'] = newsletters['Advertiser']

如何一次将一列的特定行设置为一个值?

merged.head(5)
Out[208]:
            temp    week_day    commissions newsletters num_empfs
date                    
2017-01-12  6.0     Sun         64587        NaN    NaN
2017-01-13  11.0    Mon         12668        NaN    NaN
2017-01-18  11.0    Tue         11842        NaN    NaN
2016-02-03  8.0     Wed         85861        NaN    NaN
2016-02-04  5.0     Thu         4265         NaN    NaN

newsletters.head(5)
Out[209]:
Advertiser      Datum & Uhrzeit # Empfnger
0   Vodafone    2017-01-12       48145
1   DeinHandy   2017-01-13       4751
2   Vodafone    2017-01-18       61234

我想将“newsletters”列中的特定行(显示在简报['Datum& Uhrzeit']中)设置为存储在简报['Advertiser']中的值。新闻通讯中的所有数据['Datum& Uhrzeit']出现在合并索引中。

输出应为:

            temp    week_day    commissions newsletters num_empfs
date                    
2017-01-12  6.0     Sun         64587        Vodafone   NaN
2017-01-13  11.0    Mon         12668        DeinHandy  NaN
2017-01-18  11.0    Tue         11842        Vodafone   NaN
2016-02-03  8.0     Wed         85861        NaN    NaN
2016-02-04  5.0     Thu         4265         NaN    NaN

1 个答案:

答案 0 :(得分:0)

import numpy as np
import pandas as pd

# Define the data
# -----------------------------------------------------------------------------
newsletters = pd.DataFrame({
    'Advertiser': ['Vodafone', 'DeinHandy', 'Vodafone'],
    'Datum & Uhrzeit': ['2017-01-12', '2017-01-13', '2017-01-18'],
    'Empfnger': [48145, 4751, 61234]})

merged = pd.DataFrame({
    'date': ['2017-01-12', '2017-01-13', '2017-01-18',
             '2016-02-03', '2016-02-04'],
    'temp': [6.0, 11.0, 11.0, 8.0, 5.0],
    'week_day': ['Sun', 'Mon', 'Tue', 'Wed', 'Thu'],
    'commissions': [64587, 12668, 11842, 85861, 4265],
    'newsletters': [np.nan] * 5,
    'num_empfs': [np.nan] * 5})

# Solution
# -----------------------------------------------------------------------------
merged = merged.merge(
    # Rename 'Datum & Uhrzeit' to 'date' so that we can join the two
    # DataFrames on common 'date' column. We could also join on different
    # columns using left_on & right_on parameters of merge but it would
    # result in two date columns in the merged df.
    right=newsletters.rename(
        columns={'Datum & Uhrzeit': 'date'})[['date', 'Advertiser']],
    on='date', how='left')

if merged['newsletters'].isnull().all():
    # If the 'newsletters' column is full of NaNs we can just drop it
    # and keep 'Advertiser' column instead:
    merged = merged.drop('newsletters', axis=1).rename(columns={
        'Advertiser': 'newsletters'})
else:
    # If there are not NaN values in 'newsletters' column, let's copy
    # not NaN 'Advertiser' values into 'newsletters' and drop 'Advertiser':
    merged.loc[merged['Advertiser'].notnull(), 'newsletters'] = merged.loc[
               merged['Advertiser'].notnull(), 'Advertiser']
    merged = merged.drop('Advertiser', axis=1)

验证:print(merged)

   commissions        date newsletters  num_empfs  temp week_day
0        64587  2017-01-12    Vodafone        NaN   6.0      Sun
1        12668  2017-01-13   DeinHandy        NaN  11.0      Mon
2        11842  2017-01-18    Vodafone        NaN  11.0      Tue
3        85861  2016-02-03         NaN        NaN   8.0      Wed
4         4265  2016-02-04         NaN        NaN   5.0      Thu