目前,这有效:
df['new'] = df.apply( \
lambda x: address[int(x['c1'][:5], 2)]+'_'+str(int(x['c1'][6:11], 2)) \
if x['c1'][5] == '1' \
else address[int(x['c2'][:5], 2)]+'_'+str(int(x['c2'][6:11], 2)), axis=1) `
address
是一本字典。
但非常慢。具体而言,apply
对整个数据帧的速度比apply
到选定列慢得多。但是,新列基于多列,我不知道如何实现它。
此外,有没有办法对这些类型的逻辑/条件语句进行矢量化?
示例数据框:
<bound method DataFrame.head of c1 c2
0 0000100111000111 0010110011000111
1 0001000111000111 0010110011000111
2 0101010001001010 0000000000000000
3 0101010010001110 0000000000000000
4 0101010011101010 0000000000000000
5 0111111100000100 0000000000000000
6 0111110010010110 0000000000000000
7 1000000001001100 0000000000000000
8 1110011110001000 0000000000000000
9 0000100001010000 0000000000000000
10 0001000001001010 0000000000000000
11 0101101100100100 0000000000000000
12 1110001100100100 0000000000000000
13 0010100101101001 0101010101101001
14 0000100101100000 0000000000000000
15 0000100110100000 0000000000000000
16 0001000101101011 0000000000000000
17 1001110000100001 0000000000000000
18 0111111000100000 0000000000000000
19 1000000100010110 0000000000000000
20 1110001111000010 0000000000000000
21 1011010001000010 0000000000000000
22 0110010001001111 0000000000000000
23 0111110000110101 0000000000000000
24 0111110001001100 0000000000000000
25 1000000000111101 0000000000000000
26 0000110001100010 0000000000000000
27 0001010001100010 0000000000000000
28 1100100100100101 1001011000000101
29 0101000010101010 0111110001001010
... ... ...
95714 0101111100011000 0000000000000000
95715 0010101011001011 0000000000000000
95716 0010100111100110 0101010110100110
95717 0010101000100100 0101011011100100
95718 0101000110000101 0000000000000000
答案 0 :(得分:3)
您需要向量化if-then-else
,也称为np.where
(np
代表numpy
,以防万一。)
import numpy as np
df['new'] = np.where(df['c1'].str[5] == '1',
df['c1'].str[:5],
df['c2'].str[:5])
# c1 c2 new
#0 0000100111000111 0010110011000111 00101
#1 0001000111000111 0010110011000111 00101
#2 0101010001001010 0000000000000000 01010
#....
答案 1 :(得分:3)
看起来您正在尝试根据字符串列c1
的字符值执行操作。像这样进行逐行字符串操作很慢,但是pandas可以帮助你解决.str
functions:
# begin by setting all of the values to what you want from c1
df['new'] = df['c1'].str.slice(stop=5)
# replace those that meet your criteria with what you want from 'c2'
df.loc[df['c1'].str.get(5) == '1', 'new'] = df['c2'].str.slice(stop=5)
答案 2 :(得分:2)
使用Boolean~
df['New']=df.c1.str[:5]
df.loc[df.c1.str[5]=='1','New']=(df.c2.str[:5])[df.c1.astype(str).str[5]=='1']