根据其他列中的值替换列值

时间:2020-10-04 16:58:48

标签: python pandas

到目前为止,我的数据框如下所示:

ID   Area   Stage
1    P      X
2    Q      X
3    P      X
4    Q      Y

对于阶段等于“ X”的每一行,我想用“ P”替换“ Q”区域。

所以结果应该像这样:

ID   Area   Stage
1    P      X
2    P      X
3    P      X
4    Q      Y

我尝试过:

data.query('Stage in ["X"]')['Area']=data.query('Stage in ["X"]')['Area'].replace('Q','P')

它不起作用。感谢帮助! :)

5 个答案:

答案 0 :(得分:4)

您可以使用2个布尔条件并使用loc

df.loc[df['Area'].eq("Q") & df['Stage'].eq('X'),'Area']='P'
print(df)

   ID Area Stage
0   1    P     X
1   2    P     X
2   3    P     X
3   4    Q     Y

np.where

df['Area'] = np.where(df['Area'].eq("Q") & df['Stage'].eq('X'),'P',df['Area'])

答案 1 :(得分:4)

请您尝试以下。

<html>
   *scripts content loader (AJAX)
   <script type="text/javascript" src="ajaxloader.js"></script>
  <div id="content_wrapper">
       the AJAX content loader loads the HTML component and place it here, 
       including the CSS source file. So, after it loads, it looks like this...
    <div id="menu_bar">
       <link rel="stylesheet" type="text/css" href="path/to/css/style1.css"/>
      
       // Chrome Inspect element tool returns the dimension of 250px by 50px
       <div>ABCD</div>
    </div>
  </div>

   <script>
      Callback function from the AJAX loader, fires the event after  the content has been 
      successfully loaded, and placed to the HTML page.
      function(){
        // The clientWidth and clientHeight could not work properly here, even the HTML element 
        // has already been replaced....
      }
   </script>
</html>

答案 2 :(得分:3)

您可以使用loc指定要替换的位置,并将替换后的系列传递给作业:

df.loc[df['Stage']=='X', 'Area'] = df['Area'].replace('Q','P')

输出:

   ID Area Stage
0   1    P     X
1   2    P     X
2   3    P     X
3   4    Q     Y

答案 3 :(得分:3)

注意:这不是提出新方法的答案,而是每个执行所需时间的比较

由于使用了pandas / numpy,答案中的所有建议都非常“神奇”地在一行代码中完成了这项工作,无论如何完成这项工作是不错的,但是快速完成这项工作更好,所以我想比较执行情况每个时间。

这是我的程序,在循环中,我两次修改了数据框,以使它在转到下一个时保持不变(我不是Python程序员,所以请您提前抱歉,如果方法是“可怜”):

import pandas as pd
import numpy as np
import time

df=pd.DataFrame({'ID' : [i for i in range(1,1000)],
                 'Area' : ['P' if (i & 1) else 'Q' for i in range(1,1000)],
                 'Stage' : [ 'X' if (i & 2) else 'Y' for i in range(1,1000)]})

t0=time.process_time()
for i in range(1,100):
    df.loc[df['Stage']=='X', 'Area'] = df['Area'].replace('Q','q')
    df.loc[df['Stage']=='X', 'Area'] = df['Area'].replace('q','Q')

print("Quang Hoang", '%.2f' % (time.process_time() - t0))

t0=time.process_time()
for i in range(1,100):
    df.loc[df['Stage'] == 'X', 'Area'] = 'q'
    df.loc[df['Stage'] == 'X', 'Area'] = 'Q'

print("Joe Ferndz", '%.2f' % (time.process_time() - t0))

t0=time.process_time()
for i in range(1,100):
    df.loc[df['Area'].eq("Q") & df['Stage'].eq('X'),'Area']='q'
    df.loc[df['Area'].eq("q") & df['Stage'].eq('X'),'Area']='Q'

print("anky 1", '%.2f' % (time.process_time() - t0))

t0=time.process_time()
for i in range(1,100):
    df['Area'] = np.where(df['Area'].eq("Q") & df['Stage'].eq('X'),'q',df['Area'])
    df['Area'] = np.where(df['Area'].eq("q") & df['Stage'].eq('X'),'Q',df['Area'])

print("anky 2", '%.2f' % (time.process_time() - t0))

t0=time.process_time()
for i in range(1,100):
    df['Area']=np.where(df['Stage']=='X','q',df['Area'])
    df['Area']=np.where(df['Stage']=='X','Q',df['Area'])

print("RavinderSingh13", '%.2f' % (time.process_time() - t0))

在我的PI 4上,结果是:

Quang Hoang 1.60
Joe Ferndz 1.12
anky 1 1.55
anky 2 0.86
RavinderSingh13 0.38

如果我使用具有100000行而不是1000行的数据框,则结果为:

Quang Hoang 10.79
Joe Ferndz 6.61
anky 1 10.91
anky 2 9.64
RavinderSingh13 4.75

请注意,Joe Ferndz和RavinderSingh13的建议假定Area仅是“ P”或“ Q”

答案 4 :(得分:1)

要使用另一列中的值更新一列,请使用以下选项:

df.loc[df['Stage'] == 'X', 'Area'] = 'P'

这将检查'Stage'的值是否为X。如果为True,则它将'Area'的值替换为'P'