快速提问。 我想在我的df中创建一个列,用于对其他列中的值进行分类。看看下面的代码。
df['maker_grp'] = np.nan
for key in df[df['maker_nm'].str.contains("Sam|Mike")].index:
df['maker_grp'][key] = 'Class1'
for key in df[df['maker_nm'].str.contains("Andy|John|Paul|Jay")].index:
df['maker_grp'][key] = 'Class2'
df['maker_grp'] = df.maker_grp.fillna('Class3')
它完美无缺,但我只是觉得有一种pythonic方式可以用更少的处理来做到这一点。帮帮我。感谢
答案 0 :(得分:1)
使用numpy.select
:
package com.example.hythm.ui_practise;
import android.support.v7.app.AppCompatActivity;
import android.os.Bundle;
import android.view.View;
import android.widget.Button;
import android.widget.EditText;
import android.widget.RelativeLayout;
import android.widget.TextView;
public class MainActivity extends AppCompatActivity {
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
final Button PlusButton=(Button) findViewById(R.id.ButtonOpPlus);
final Button ButtonNo1=(Button) findViewById(R.id.no1);
final Button ButtonNo2=(Button) findViewById(R.id.no2);
final Button ButtonNo3=(Button) findViewById(R.id.no3);
final Button ButtonNo4=(Button) findViewById(R.id.no4);
final Button ButtonNo5=(Button) findViewById(R.id.no5);
final Button ButtonNo6=(Button) findViewById(R.id.no6);
final Button ButtonNo7=(Button) findViewById(R.id.no7);
final Button ButtonNo8=(Button) findViewById(R.id.no8);
final Button ButtonNo9=(Button) findViewById(R.id.no9);
final TextView ResultTextView=(TextView)findViewById(R.id.Result);
final Button CalculateButton=(Button) findViewById(R.id.Calculate);
final TextView tempv=new TextView(this);
CalculateButton.setOnClickListener(
new View.OnClickListener() {
@Override
public void onClick(View v) {
EditText NoOfInputsEditText=(EditText)findViewById(R.id.NoOfInputs);
int size = Integer.parseInt(NoOfInputsEditText.getText().toString()); // total number of TextViews to add
**RelativeLayout myLayout=(RelativeLayout)R.layout.activity_main;
EditText newEditText=new EditText(getBaseContext());
myLayout.addView(newEditText);**
}
}
);
}
}
样品:
m1 = df['maker_nm'].str.contains("Sam|Mike")
m2 = df['maker_nm'].str.contains("Andy|John|Paul|Jay")
df['maker_grp'] = np.select([m1,m2], ['Class1','Class2'], default='Class3')
如果许多具有自定义功能的条件df = pd.DataFrame({'maker_nm':['Sam 1','Joe 5','Paul 7','Mike 0']})
#print (df)
m1 = df['maker_nm'].str.contains("Sam|Mike")
m2 = df['maker_nm'].str.contains("Andy|John|Paul|Jay")
df['maker_grp'] = np.select([m1,m2], ['Class1','Class2'], default='Class3')
print (df)
maker_nm maker_grp
0 Sam 1 Class1
1 Joe 5 Class3
2 Paul 7 Class2
3 Mike 0 Class1
应该更快:
apply
<强>计时强>:
import re
def f(x):
p1 = re.compile("Sam|Mike")
p2 = re.compile("Andy|John|Paul|Jay")
if p1.match(x):
return 'Class1'
elif p2.match(x):
return 'Class2'
else:
return 'Class3'
df['maker_grp'] = df['maker_nm'].apply(f)
<强>买者强>:
性能实际上取决于数据和条件数量。
编辑:对于检查子串的许多条件,df = pd.DataFrame({'maker_nm':['Sam 1','Joe 5','Paul 7','Mike 0']})
df = pd.concat([df] * 1000, ignore_index=True)
#print (df)
In [117]: %%timeit
...: df['maker_grp'] = np.nan
...: for key in df[df['maker_nm'].str.contains("Sam|Mike")].index:
...: df['maker_grp'][key] = 'Class1'
...: for key in df[df['maker_nm'].str.contains("Andy|John|Paul|Jay")].index:
...: df['maker_grp'][key] = 'Class2'
...: df['maker_grp'] = df.maker_grp.fillna('Class3')
...:
In [118]: %%timeit
...: m1 = df['maker_nm'].str.contains("Sam|Mike")
...: m2 = df['maker_nm'].str.contains("Andy|John|Paul|Jay")
...:
...: df['maker_grp'] = np.select([m1,m2], ['Class1','Class2'], default='Class3')
...:
100 loops, best of 3: 5.98 ms per loop
In [119]: %%timeit
...: df['maker_grp'] = df['maker_nm'].apply(f)
...:
100 loops, best of 3: 7.38 ms per loop
:
apply
m1 = df['maker_nm'].str.contains("Sam", regex=False)
m2 = df['maker_nm'].str.contains("Mike", regex=False)
m3 = df['maker_nm'].str.contains("Andy", regex=False)
m4 = df['maker_nm'].str.contains("John", regex=False)
m5 = df['maker_nm'].str.contains("Jay", regex=False)
df['maker_grp'] = np.select([m1,m2,m3,m4,m5], ['Class1','Class1', 'Class2','Class2','Class2'], default='Class3')
print (df)
def f(x):
if 'Sam' in x:
return 'Class1'
elif 'Mike' in x:
return 'Class1'
elif 'Andy' in x:
return 'Class2'
elif 'John' in x:
return 'Class2'
elif 'Paul' in x:
return 'Class2'
elif 'Jay' in x:
return 'Class2'
else:
return 'Class3'
df['maker_grp'] = df['maker_nm'].apply(f)
print (df)
答案 1 :(得分:1)
我认为这可以用熊猫非常简洁地完成。这应该比使用for循环迭代每个键更快。
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'maker_nm':['Sam 1','Joe 5','Paul 7','Mike 0']})
In [3]: conditions = {'Sam|Mike': 'Class1', 'Andy|John|Paul|Jay': 'Class2'}
In [4]: df.join(pd.concat([df[df.maker_nm.str.contains(c)].assign(maker_grp=conditions[c])
...: for c in conditions]).maker_grp).fillna('Class3')
...:
Out[4]:
maker_nm maker_grp
0 Sam 1 Class1
1 Joe 5 Class3
2 Paul 7 Class2
3 Mike 0 Class1