数据集
df = pd.DataFrame({'a': [0,3,4], 'b': ['0101010', '0100010', '0111100']})
基本上尝试创建一个列,该列采用列b
的长度1的子字符串,起始于列a
的位置编号
尝试
position = df['a']
df['c'] = df['b'].str[position]
所需的输出
a b c
0 0101010 0
3 0100010 0
4 0111100 1
答案 0 :(得分:3)
将list comprehension
与zip
一起使用:
df['c'] = [b[a] for a, b in zip(df.a, df.b)]
或apply
:
df['c'] = df.apply(lambda x: x['b'][x['a']], axis=1)
print (df)
a b c
0 0 0101010 0
1 3 0100010 0
2 4 0111100 1
性能不同:
#[3000 rows x 2 columns]
df = pd.concat([df] * 1000, ignore_index=True)
In [236]: %timeit df['c'] = [b[a] for a, b in zip(df.a, df.b)]
557 µs ± 25.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [237]: %timeit df['c'] = df.apply(lambda x: x['b'][x['a']], axis=1)
57.3 ms ± 358 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)