下面的数据框由pd.read_sql填充。如何为book_date == start_date中的每个唯一Group / SubGroup对选择wf值,并将其存储在“new”列中。
*为了更加清晰,我对行进行了星号,星号不在数据集中。
| | Group | SubGroup | book_date | start_date | wf | co2 | new |
|-------|-------|-----------|-----------|------------|------|----------|-----|
| 236 | Virgo | Milkyway | 3/1/1985 | 5/1/1985 | 0.04 | NaN | |
| 239 | Virgo | Milkyway | 4/1/1985 | 5/1/1985 | 0.05 | NaN | |
| 1178 | Virgo*| Milkyway* | 5/1/1985* | 5/1/1985* | 0.06*| 0.004179*| |
| 535 | Virgo | Milkyway | 6/1/1985 | 5/1/1985 | 0.07 | 0.008245 | |
| 1056 | Virgo | Andromeda | 6/1/1993 | 8/1/1993 | 1.57 | NaN | |
| 1046 | Virgo | Andromeda | 7/1/1993 | 8/1/1993 | 1.58 | NaN | |
| 956 | Virgo*| Andromeda*| 8/1/1993* | 8/1/1993* | 1.59*| 0.006688*| |
| 776 | Virgo | Andromeda | 9/1/1993 | 8/1/1993 | 1.60 | 0.012917 | |
这是预期的结果。
| | Group | SubGroup | book_date | start_date | wf | co2 | new |
|-------|-------|-----------|-----------|------------|------|----------|------|
| 236 | Virgo | Milkyway | 3/1/1985 | 5/1/1985 | 0.04 | NaN | 0.06 |
| 239 | Virgo | Milkyway | 4/1/1985 | 5/1/1985 | 0.05 | NaN | 0.06 |
| 1178 | Virgo*| Milkyway* | 5/1/1985* | 5/1/1985* | 0.06*| 0.004179*| 0.06 |
| 535 | Virgo | Milkyway | 6/1/1985 | 5/1/1985 | 0.07 | 0.008245 | 0.06 |
| 1056 | Virgo | Andromeda | 6/1/1993 | 8/1/1993 | 1.57 | NaN | 1.59 |
| 1046 | Virgo | Andromeda | 7/1/1993 | 8/1/1993 | 1.58 | NaN | 1.59 |
| 956 | Virgo*| Andromeda*| 8/1/1993* | 8/1/1993* | 1.59*| 0.006688*| 1.59 |
| 776 | Virgo | Andromeda | 9/1/1993 | 8/1/1993 | 1.60 | 0.012917 | 1.59 |
答案 0 :(得分:0)
我首先根据条件创建新列,然后填充
df['new']=np.where(df['book_date'] == df['start_date'], df['wf'],np.nan)
df['new'] = df.groupby(['Group', 'SubGroup']).new.apply(lambda x: x.ffill().bfill())
Group SubGroup book_date start_date wf co2 new
236 Virgo Milkyway 3/1/1985 5/1/1985 0.04 NaN 0.06
239 Virgo Milkyway 4/1/1985 5/1/1985 0.05 NaN 0.06
1178 Virgo Milkyway 5/1/1985 5/1/1985 0.06 0.004179 0.06
535 Virgo Milkyway 6/1/1985 5/1/1985 0.07 0.008245 0.06
1056 Virgo Andromeda 6/1/1993 8/1/1993 1.57 NaN 1.59
1046 Virgo Andromeda 7/1/1993 8/1/1993 1.58 NaN 1.59
956 Virgo Andromeda 8/1/1993 8/1/1993 1.59 0.006688 1.59
776 Virgo Andromeda 9/1/1993 8/1/1993 1.60 0.012917 1.59
答案 1 :(得分:0)
获取您的数据
df = pd.read_clipboard()
df.head()
df.replace({'\*': ''}, regex=True, inplace=True)
def gen_new_col(frame):
if frame['book_date'] == frame['start_date']:
return frame['wf']
else:
return 'ignore'
df['new_col'] = df.apply(gen_new_col, axis=1)
df['g_subg'] = df['Group'] + "|" + df['SubGroup']
df
Group SubGroup book_date start_date wf co2 new_col g_subg
0 Virgo Milkyway 3/1/1985 5/1/1985 0.04 NaN ignore Virgo|Milkyway
1 Virgo Milkyway 4/1/1985 5/1/1985 0.05 NaN ignore Virgo|Milkyway
2 Virgo Milkyway 5/1/1985 5/1/1985 0.06 0.004179 0.06 Virgo|Milkyway
3 Virgo Milkyway 6/1/1985 5/1/1985 0.07 0.008245 ignore Virgo|Milkyway
4 Virgo Andromeda 6/1/1993 8/1/1993 1.57 NaN ignore Virgo|Andromeda
5 Virgo Andromeda 7/1/1993 8/1/1993 1.58 NaN ignore Virgo|Andromeda
6 Virgo Andromeda 8/1/1993 8/1/1993 1.59 0.006688 1.59 Virgo|Andromeda
7 Virgo Andromeda 9/1/1993 8/1/1993 1.6 0.012917 ignore Virgo|Andromeda
# Get a lookup
valid = df[df['new_col'] != 'ignore']
lookup = dict(zip(valid['g_subg'], valid['new_col']))
lookup
{'Virgo|Andromeda': '1.59', 'Virgo|Milkyway': '0.06'}
# Bring it back in
df['final_value'] = df['g_subg'].map(lambda x: lookup[x])
df
Group SubGroup book_date start_date wf co2 new_col g_subg final_value
0 Virgo Milkyway 3/1/1985 5/1/1985 0.04 NaN ignore Virgo|Milkyway 0.06
1 Virgo Milkyway 4/1/1985 5/1/1985 0.05 NaN ignore Virgo|Milkyway 0.06
2 Virgo Milkyway 5/1/1985 5/1/1985 0.06 0.004179 0.06 Virgo|Milkyway 0.06
3 Virgo Milkyway 6/1/1985 5/1/1985 0.07 0.008245 ignore Virgo|Milkyway 0.06
4 Virgo Andromeda 6/1/1993 8/1/1993 1.57 NaN ignore Virgo|Andromeda 1.59
5 Virgo Andromeda 7/1/1993 8/1/1993 1.58 NaN ignore Virgo|Andromeda 1.59
6 Virgo Andromeda 8/1/1993 8/1/1993 1.59 0.006688 1.59 Virgo|Andromeda 1.59
7 Virgo Andromeda 9/1/1993 8/1/1993 1.6 0.012917 ignore Virgo|Andromeda 1.59