Python - 将数据框元素转换为np.arrays

时间:2017-10-24 14:07:24

标签: python arrays pandas numpy dataframe

我有一个数据帧df3,其中一列具有以下格式:

df3
Out[196]: 
                                              Utterances
0                   23825 141520 79229147 135 1951822935
1                                       15162091514 2015
2                                      1851315229147 114
3                                  225189625 141135 1144
4                                    1325 31854920 31184
5                                         31854920 31184
6           2085185-5719 19151352089147 1514 1325 229191
7      2085185 919 114 11618 129115 2015 3151329145 2...
8      185351420 193113 21815115 9142015 1325 3151316...
9      851216 2015 7520 1325 6152118 8211441854 41512...
10                                     31143512 15184518
11                                     31143512 15184518
12                                 13315211420 172151825
13                                                229191
14                                        16518191514112
15     9 15235 1514 19516205132518 20235142025 149142...
16     9 14554 2015 69144 152120 2385185 1325 14523 3...

我需要创建一个具有以下格式的numpy数组:

array=[[23825, 141520, 79229147, 135, 1951822935], [15162091514, 2015], [1851315229147, 114], [.....]]

此外,方法:df3.values不起作用,因为输出如下:

array([['23825 141520 79229147 135 1951822935'],
       ['15162091514 2015'],
       ['1851315229147 114'],
       [....]],

感谢任何帮助,谢谢。

1 个答案:

答案 0 :(得分:0)

使用

In [5639]: np.array(map(str.split, df.Utterances.values))
Out[5639]:
array([['23825', '141520', '79229147', '135', '1951822935'],
       ['15162091514', '2015'], ['1851315229147', '114'],
       ['225189625', '141135', '1144'], ['1325', '31854920', '31184'],
       ['31854920', '31184'],
       ['2085185-5719', '19151352089147', '1514', '1325', '229191'],
       ['2085185', '919', '114', '11618', '129115', '2015', '3151329145', '2'],
       ['185351420', '193113', '21815115', '9142015', '1325', '3151316'],
       ['851216', '2015', '7520', '1325', '6152118', '8211441854', '41512'],
       ['31143512', '15184518'], ['31143512', '15184518'],
       ['13315211420', '172151825'], ['229191'], ['16518191514112'],
       ['9', '15235', '1514', '19516205132518', '20235142025', '149142'],
       ['9', '14554', '2015', '69144', '152120', '2385185', '1325', '14523', '3']], 
       dtype=object)

或者,

In [5642]: np.array([x.split() for x in df.Utterances.values])
Out[5642]:
array([['23825', '141520', '79229147', '135', '1951822935'],
       ['15162091514', '2015'], ['1851315229147', '114'],
       ['225189625', '141135', '1144'], ['1325', '31854920', '31184'],
       ['31854920', '31184'],
       ['2085185-5719', '19151352089147', '1514', '1325', '229191'],
       ['2085185', '919', '114', '11618', '129115', '2015', '3151329145', '2'],
       ['185351420', '193113', '21815115', '9142015', '1325', '3151316'],
       ['851216', '2015', '7520', '1325', '6152118', '8211441854', '41512'],
       ['31143512', '15184518'], ['31143512', '15184518'],
       ['13315211420', '172151825'], ['229191'], ['16518191514112'],
       ['9', '15235', '1514', '19516205132518', '20235142025', '149142'],
       ['9', '14554', '2015', '69144', '152120', '2385185', '1325', '14523', '3']], 
       dtype=object)