根据同一pandas数据框中的其他列为列分配值

时间:2016-10-21 10:52:35

标签: python pandas lambda conditional-statements

使用数据框,我有一列,名为public override int SaveChanges(SaveOptions options) { foreach (ObjectStateEntry entry in ObjectStateManager.GetObjectStateEntries( EntityState.Added | EntityState.Modified)) { // Validate the objects in the Added and Modified state // if the validation fails, e.g. throw an exeption. } return base.SaveChanges(options); }

TM52_fail

我想创建一个名为2 1 - 1 & 2 1 & 2 & 3 - - 3 etc. 的附加列,其内容取决于列TM52_fail_norm的内容。 我的尝试(包括条件填充):

TM52_fail

返回一个空列(我假设为def str_to_number(x): if x=="1" or x=="2" or x=="3": return 1 elif x=="1 & 2" or x=="2 & 3" or x=="1 & 3": return 2 elif x=="1 & 2 & 3": return 3 else: return 0 df['TM52_fail_norm'] = "" df['TM52_fail_norm'].apply(lambda x: str_to_number(x for x in df['TM52_fail'])) )。

2 个答案:

答案 0 :(得分:2)

我认为您需要按astype转换为字符串,然后应用函数str_to_number

df['new'] = df['TM52_fail_norm'].astype(str).apply(str_to_number)
print (df)
  TM52_fail_norm  new
0              2    1
1              1    1
2              -    0
3          1 & 2    2
4      1 & 2 & 3    3
5              -    0
6              -    0
7              3    1

dict 0的另一个解决方案,int最后需要map并投放到d = {'1':1,'2':1,'3':1,'1 & 2':2, '2 & 3':2, '1 & 3':2,'1 & 2 & 3':3} df['new'] = df['TM52_fail_norm'].map(d) df['new'] = df['new'].fillna(0).astype(int) print (df) TM52_fail_norm new 0 2 1 1 1 1 2 - 0 3 1 & 2 2 4 1 & 2 & 3 3 5 - 0 6 - 0 7 3 1

#[800000 rows x 1 columns]
df = pd.concat([df]*100000).reset_index(drop=True)

In [315]: %timeit (jez1(df))
10 loops, best of 3: 63 ms per loop

In [316]: %timeit (df['TM52_fail_norm'].astype(str).apply(str_to_number))
1 loop, best of 3: 518 ms per loop

#http://stackoverflow.com/a/40176883/2901002
In [345]: %timeit (df.TM52_fail_norm.str.count('\d+'))
1 loop, best of 3: 707 ms per loop


def jez1(df):
    d = {'1':1,'2':1,'3':1,'1 & 2':2, '2 & 3':2, '1 & 3':2,'1 & 2 & 3':3}

    df['new'] = df['TM52_fail_norm'].map(d)
    df['new'] = df['new'].fillna(0).astype(int)
    return (df)

print (jez1(df))

<强>计时

final Handler handler = new Handler(); 
Runnable runnable = new Runnable() { 

    @Override 
    public void run() { 
        try{
            //do your code here
        }
        catch (Exception e) {
            // TODO: handle exception
        }
        finally{
            //also call the same runnable to call it at regular interval
            handler.postDelayed(this, "*interval"); 
        }
    } 
}; 
handler.postDelayed(runnable, "*interval"); 

答案 1 :(得分:1)

TL; DR:df.TM52_fail.str.count('\d+')

看起来你真正想要的是计算位数。在这里,熊猫&#39; .str访问者方法(docssummary of .str methods)非常有用!

我认为TM52_fail是dtype str;否则你可以使用.astype(str)投射,如@jezrael所建议的那样:

# setup
import pandas as pd
df = pd.DataFrame({'TM52_fail':[
    "2", "1", "", "1 & 2", "1 & 2 & 3", "", "", "3"]})

# Use regex \d+ to find 1 or more consecutive digits
df['TM52_fail_norm2'] = df.TM52_fail.str.count('\d+')

计时

Regex: 155 µs per loop
 jez1: 999 µs per loop