我有一个数据帧df3,其中一列具有以下格式:
df3
Out[196]:
Utterances
0 23825 141520 79229147 135 1951822935
1 15162091514 2015
2 1851315229147 114
3 225189625 141135 1144
4 1325 31854920 31184
5 31854920 31184
6 2085185-5719 19151352089147 1514 1325 229191
7 2085185 919 114 11618 129115 2015 3151329145 2...
8 185351420 193113 21815115 9142015 1325 3151316...
9 851216 2015 7520 1325 6152118 8211441854 41512...
10 31143512 15184518
11 31143512 15184518
12 13315211420 172151825
13 229191
14 16518191514112
15 9 15235 1514 19516205132518 20235142025 149142...
16 9 14554 2015 69144 152120 2385185 1325 14523 3...
我需要创建一个具有以下格式的numpy数组:
array=[[23825, 141520, 79229147, 135, 1951822935], [15162091514, 2015], [1851315229147, 114], [.....]]
此外,方法:df3.values不起作用,因为输出如下:
array([['23825 141520 79229147 135 1951822935'],
['15162091514 2015'],
['1851315229147 114'],
[....]],
感谢任何帮助,谢谢。
答案 0 :(得分:0)
使用
In [5639]: np.array(map(str.split, df.Utterances.values))
Out[5639]:
array([['23825', '141520', '79229147', '135', '1951822935'],
['15162091514', '2015'], ['1851315229147', '114'],
['225189625', '141135', '1144'], ['1325', '31854920', '31184'],
['31854920', '31184'],
['2085185-5719', '19151352089147', '1514', '1325', '229191'],
['2085185', '919', '114', '11618', '129115', '2015', '3151329145', '2'],
['185351420', '193113', '21815115', '9142015', '1325', '3151316'],
['851216', '2015', '7520', '1325', '6152118', '8211441854', '41512'],
['31143512', '15184518'], ['31143512', '15184518'],
['13315211420', '172151825'], ['229191'], ['16518191514112'],
['9', '15235', '1514', '19516205132518', '20235142025', '149142'],
['9', '14554', '2015', '69144', '152120', '2385185', '1325', '14523', '3']],
dtype=object)
或者,
In [5642]: np.array([x.split() for x in df.Utterances.values])
Out[5642]:
array([['23825', '141520', '79229147', '135', '1951822935'],
['15162091514', '2015'], ['1851315229147', '114'],
['225189625', '141135', '1144'], ['1325', '31854920', '31184'],
['31854920', '31184'],
['2085185-5719', '19151352089147', '1514', '1325', '229191'],
['2085185', '919', '114', '11618', '129115', '2015', '3151329145', '2'],
['185351420', '193113', '21815115', '9142015', '1325', '3151316'],
['851216', '2015', '7520', '1325', '6152118', '8211441854', '41512'],
['31143512', '15184518'], ['31143512', '15184518'],
['13315211420', '172151825'], ['229191'], ['16518191514112'],
['9', '15235', '1514', '19516205132518', '20235142025', '149142'],
['9', '14554', '2015', '69144', '152120', '2385185', '1325', '14523', '3']],
dtype=object)