将python pandas df替换为基于条件的第二个数据帧的值

时间:2013-12-09 16:24:22

标签: python dataframe

我是python的新手,因为我通常在R中编写脚本,因此我正在学习如何适应Pandas数据帧和细微差别。

我有两个dicts列表,我变成了数据帧,因为我认为以这种格式更容易使用。

df1= [{u'test': u'SAT Math', u'25th_percentile': None, u'75th_percentile': None, u'50th_percentile': None, u'mean': 404}, {u'test': u'SAT Verbal', u'25th_percentile': None, u'75th_percentile': None, u'50th_percentile': None, u'mean': 355}, {u'test': u'SAT Writing', u'25th_percentile': None, u'75th_percentile': None, u'50th_percentile': None, u'mean': 363}, {u'test': u'SAT Composite', u'25th_percentile': None, u'75th_percentile': None, u'50th_percentile': None, u'mean': 1122}, {u'test': u'ACT Math', u'25th_percentile': None, u'75th_percentile': None, u'50th_percentile': None, u'mean': None}, {u'test': u'ACT English', u'25th_percentile': None, u'75th_percentile': None, u'50th_percentile': None, u'mean': None}, {u'test': u'ACT Reading', u'25th_percentile': None, u'75th_percentile': None, u'50th_percentile': None, u'mean': None}, {u'test': u'ACT Science', u'25th_percentile': None, u'75th_percentile': None, u'50th_percentile': None, u'mean': None}, {u'test': u'ACT Composite', u'25th_percentile': None, u'75th_percentile': None, u'50th_percentile': None, u'mean': None}]


df2 = [{u'test': u'SAT Composite', u'mean': 1981}, {u'test': u'ACT Composite', u'mean': 29.6}]

然后我将这些作为数据帧:

df1new = DataFrame(df1, columns=['test', '25th_percentile', 'mean', '50th_percentile','75th_percentile'])
df2new = DataFrame(df2)

现在,如果'test'==“ACT Composite”并且'mean'为None

,我想替换df1new中'mean'列的内容

我尝试使用combine_first方法,但我相信这需要对数据帧进行更类似的索引。 我也尝试过:

if df1new['test'] == "ACT Composite" and df1new['mean'] == None:
            df1new['mean'] == df2new['mean']

以及.replace()变体。

任何建议都将不胜感激! 先感谢您!

1 个答案:

答案 0 :(得分:1)

也许这个:

idx = (df1new.test == 'ACT Composite') & df1new['mean'].isnull()
df1new['mean'][idx] = df2new['mean'][1]

我在那里添加了[1],因为我认为这就是你想要的,mean对应ACT Composite中的df2new。它也可以写成

df1new['mean'][idx] = df2new['mean'][df2new.test == 'ACT Composite']