我试图以矢量化方式切割字符串,答案是NaN。虽然顺序索引(比如str [:1])是常量,但工作正常。任何帮助
df = pd.DataFrame({'NAME': ['abc','xyz','hello'], 'SEQ': [1,2,1]}) #
df['SUB'] = df['NAME'].str[:df['SEQ']]
输出
NAME SEQ SUB
0 abc 1 NaN
1 xyz 2 NaN
2 hello 1 NaN
答案 0 :(得分:2)
不幸的是,矢量化解决方案不存在。
将apply
与lambda函数一起使用:
df['SUB'] = df.apply(lambda x: x['NAME'][:x['SEQ']], axis=1)
或zip
list comprehension
以获得更好的效果:
df['SUB'] = [x[:y] for x, y in zip(df['NAME'], df['SEQ'])]
print (df)
NAME SEQ SUB
0 abc 1 a
1 xyz 2 xy
2 hello 1 h
<强>计时强>:
df = pd.DataFrame({'NAME': ['abc','xyz','hello'], 'SEQ': [1,2,1]})
df = pd.concat([df] * 1000, ignore_index=True)
In [270]: %timeit df["SUB"] = df.groupby("SEQ").NAME.transform(lambda g: g.str[: g.name])
4.23 ms ± 222 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [271]: %timeit df['SUB'] = df.apply(lambda x: x['NAME'][:x['SEQ']], axis=1)
104 ms ± 2.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [272]: %timeit df['SUB'] = [x[:y] for x, y in zip(df['NAME'], df['SEQ'])]
785 µs ± 22.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
答案 1 :(得分:0)
使用groupby
:
df["SUB"] = df.groupby("SEQ").NAME.transform(lambda g: g.str[: g.name])
如果SEQ
中的唯一值很少,则可能有意义。