有一个数据框df,其中一些列描述了给定bin中的某些单位及其输出,如下所示:
df = pd.DataFrame({'bin_dir' : pd.cut(np.rad2deg(np.random.vonmises(np.pi,0.03,100)) % 360,np.arange(0,365,5)),
'Unit' : np.tile(np.arange(1,11),10),
'value' : np.random.randn(100)*1000+3600})
我现在要创建一个列col1,当单位为1,3,5时,其值为1,bin_dir为(350,355),(355,360),(0,5),(5,10)和2时为单位是2,4,9,dir_bin是(350,355),(355,360),(0,5),(5,10)
怎么能这样做?在dplyr中,我可以将mutate与嵌套的ifelse
语句一起使用。
如果解决方案可以合并到链式命令中会很好:)
由于
答案 0 :(得分:3)
您可以使用嵌套的np.where():
import re
import pandas as pd
In [50]: bins = re.findall(r'\(.*?\]', '(350, 355], (355, 360], (0, 5], (5, 10]')
...: bin_mask = df.bin_dir.isin(bins)
...: unit_mask1 = df.Unit.isin([1,3,5])
...: unit_mask2 = df.Unit.isin([2,4,9])
...:
In [51]: df.assign(col1=
...: np.where(bin_mask & unit_mask1,
...: 1,
...: np.where(bin_mask & unit_mask2, 2, np.nan)
...: )
...: )
...:
Out[51]:
Unit bin_dir value col1
0 1 (195, 200] 1228.056261 NaN
1 2 (125, 130] 3246.052662 NaN
2 3 (150, 155] 3128.356490 NaN
3 4 (215, 220] 2900.812099 NaN
4 5 (110, 115] 4324.152904 NaN
5 6 (150, 155] 4783.110204 NaN
6 7 (240, 245] 4810.120258 NaN
7 8 (210, 215] 4307.576911 NaN
8 9 (15, 20] 3043.099987 NaN
9 10 (0, 5] 4633.435048 NaN
10 1 (145, 150] 3401.690163 NaN
11 2 (320, 325] 4224.314088 NaN
12 3 (350, 355] 4037.081806 1.0
13 4 (295, 300] 3096.652374 NaN
14 5 (235, 240] 4738.227922 NaN
15 6 (235, 240] 1973.561204 NaN
16 7 (270, 275] 3500.619163 NaN
17 8 (45, 50] 4234.621801 NaN
18 9 (255, 260] 4267.575087 NaN
19 10 (320, 325] 3031.733130 NaN
20 1 (235, 240] 3137.832272 NaN
21 2 (330, 335] 4113.654195 NaN
22 3 (265, 270] 3060.886390 NaN
23 4 (290, 295] 2836.105371 NaN
24 5 (255, 260] 2756.894839 NaN
.. ... ... ... ...
75 6 (325, 330] 2471.775169 NaN
76 7 (70, 75] 4463.964881 NaN
77 8 (110, 115] 5681.124294 NaN
78 9 (135, 140] 2500.650717 NaN
79 10 (225, 230] 2936.364153 NaN
80 1 (280, 285] 1138.591459 NaN
81 2 (250, 255] 3121.142300 NaN
82 3 (150, 155] 2991.257906 NaN
83 4 (160, 165] 3078.156743 NaN
84 5 (130, 135] 4335.076559 NaN
85 6 (85, 90] 4970.471290 NaN
86 7 (335, 340] 3207.906304 NaN
87 8 (350, 355] 3605.474926 NaN
88 9 (125, 130] 4922.963220 NaN
89 10 (60, 65] 3121.061944 NaN
90 1 (105, 110] 3092.191627 NaN
91 2 (0, 5] 3693.602055 2.0
92 3 (195, 200] 2291.508096 NaN
93 4 (40, 45] 4628.409801 NaN
94 5 (215, 220] 3327.321452 NaN
95 6 (110, 115] 4347.471046 NaN
96 7 (110, 115] 4494.707840 NaN
97 8 (110, 115] 3545.460851 NaN
98 9 (55, 60] 2831.042251 NaN
99 10 (30, 35] 3705.225870 NaN
[100 rows x 4 columns]
当然,你可以在没有预先计算的面具的情况下做到这一点:
In [52]: df.assign(col1=
...: np.where(df.bin_dir.isin(bins) & df.Unit.isin([1,3,5]),
...: 1,
...: np.where(df.bin_dir.isin(bins) & df.Unit.isin([2,4,9]),
...: 2,
...: np.nan
...: )
...: )
...: )
...:
Out[52]:
Unit bin_dir value col1
0 1 (195, 200] 1228.056261 NaN
1 2 (125, 130] 3246.052662 NaN
2 3 (150, 155] 3128.356490 NaN
3 4 (215, 220] 2900.812099 NaN
4 5 (110, 115] 4324.152904 NaN
5 6 (150, 155] 4783.110204 NaN
6 7 (240, 245] 4810.120258 NaN
7 8 (210, 215] 4307.576911 NaN
8 9 (15, 20] 3043.099987 NaN
9 10 (0, 5] 4633.435048 NaN
10 1 (145, 150] 3401.690163 NaN
11 2 (320, 325] 4224.314088 NaN
12 3 (350, 355] 4037.081806 1.0
13 4 (295, 300] 3096.652374 NaN
14 5 (235, 240] 4738.227922 NaN
15 6 (235, 240] 1973.561204 NaN
16 7 (270, 275] 3500.619163 NaN
17 8 (45, 50] 4234.621801 NaN
18 9 (255, 260] 4267.575087 NaN
19 10 (320, 325] 3031.733130 NaN
20 1 (235, 240] 3137.832272 NaN
21 2 (330, 335] 4113.654195 NaN
22 3 (265, 270] 3060.886390 NaN
23 4 (290, 295] 2836.105371 NaN
24 5 (255, 260] 2756.894839 NaN
.. ... ... ... ...
75 6 (325, 330] 2471.775169 NaN
76 7 (70, 75] 4463.964881 NaN
77 8 (110, 115] 5681.124294 NaN
78 9 (135, 140] 2500.650717 NaN
79 10 (225, 230] 2936.364153 NaN
80 1 (280, 285] 1138.591459 NaN
81 2 (250, 255] 3121.142300 NaN
82 3 (150, 155] 2991.257906 NaN
83 4 (160, 165] 3078.156743 NaN
84 5 (130, 135] 4335.076559 NaN
85 6 (85, 90] 4970.471290 NaN
86 7 (335, 340] 3207.906304 NaN
87 8 (350, 355] 3605.474926 NaN
88 9 (125, 130] 4922.963220 NaN
89 10 (60, 65] 3121.061944 NaN
90 1 (105, 110] 3092.191627 NaN
91 2 (0, 5] 3693.602055 2.0
92 3 (195, 200] 2291.508096 NaN
93 4 (40, 45] 4628.409801 NaN
94 5 (215, 220] 3327.321452 NaN
95 6 (110, 115] 4347.471046 NaN
96 7 (110, 115] 4494.707840 NaN
97 8 (110, 115] 3545.460851 NaN
98 9 (55, 60] 2831.042251 NaN
99 10 (30, 35] 3705.225870 NaN
[100 rows x 4 columns]
但它会变慢,看起来有点麻烦
答案 1 :(得分:0)
使用列表理解:
bin_filt = ['(350, 355]', '(355, 360]', '(0, 5]', '(5, 10]']
# Creates a column 'col1'
df['col1'] = [1 for i in range(df.shape[0]) if df['Unit'][i] in [1, 3, 5] and df['bin_dir'][i] in bin_filt else 0]
# Creates a column 'col2'
df['col2'] = [2 for i in range(df.shape[0]) if df['Unit'][i] in [2, 4, 9] and df['bin_dir'][i] in bin_filt else 0]
# You can replace the 'else' statement a t the end of the list comprehension to put the value you want instead