我有2列数据框这样:
ITEM REFNUMS
1 00000299 0036701923024762922029229294652954429569295832...
2 00000655 NaN
24 00001791 00016027123076000158004563065131972
25 00001805 00016027123076000158004563065131972
26 00001813 00016027123076000158004563065131972
27 00001821 00016027123076000158004563065131972
28 00001937 0142530521316303164702509000510012201310027820...
我想将REFNUMS
列拆分为可分割部分,并在可能的情况下添加到现有数据框中,因为我需要保留行索引并匹配ITEM#。 REFNUMS
中的数据是5
可以整除的长度,而不是NaN
,因此例如第1行= 78套5。
data_len = (data['REFNUMS'].str.len())/5
然后
0 NaN
1 78.0
2 NaN
感谢有关如何执行此操作的任何建议。
答案 0 :(得分:1)
IIUC,您可以使用str.extractall
获取5位数的组,清理列,然后加入:
In [168]: r = df.REFNUMS.str.extractall("(\d{1,5})").unstack()
In [169]: r.columns = r.columns.droplevel(0)
In [170]: df.join(r)
Out[170]:
ITEM REFNUMS 0 1 2 3 4 5 6 7 8 9
1 299 0036701923024762922029229294652954429569295832... 00367 01923 02476 29220 29229 29465 29544 29569 29583 2
2 655 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
24 1791 00016027123076000158004563065131972 00016 02712 30760 00158 00456 30651 31972 None None None
25 1805 00016027123076000158004563065131972 00016 02712 30760 00158 00456 30651 31972 None None None
26 1813 00016027123076000158004563065131972 00016 02712 30760 00158 00456 30651 31972 None None None
27 1821 00016027123076000158004563065131972 00016 02712 30760 00158 00456 30651 31972 None None None
28 1937 0142530521316303164702509000510012201310027820... 01425 30521 31630 31647 02509 00051 00122 01310 02782 0