在数据框中使用字符串拆分列

时间:2019-05-24 06:27:38

标签: python dataframe

我想按如下方式将数据框的列分为两列。查看列0.5,这是我要添加的内容。我当前的代码不起作用。

样本数据:

{"offset":"14726172634","bids":[["871094.22000","0.00200000","0.00200000","0","1081537351","29194","5","14726172633","1"],["871076.11000","0.00808000","0.00808000","0","1081537130","623964","5","14726172043","1"],["871073.96500","0.00100000","0.00100000","0","1081537185","29194","5","14726172231","1"],["871042.87000","0.00500000","0.00500000","0","1081536781","29194","5","14726171235","1"],["871038.55000","0.00500000","0.00500000","0","1081537169","29194","5","14726172161","1"],["871032.90156","0.00100000","0.00100000","0","1081537343","29194","5","14726172616","1"],

代码:

data = pd.read_csv('20190523-012523_product_5_snapshot_14726172634_14728561053.txt', lineterminator= str(']'), low_memory= False, error_bad_lines=False, header= None)

new = data[1].str.split(":", n = 1, expand = True) 
data.to_csv('parser.csv')
print(data)
sys.exit()

当前输出:

                              0                      1         2         3    4             5         6    7             8    9
 0       {"offset":"14726172634"  bids:[["871094.22000"  0.002000  0.002000  0.0  1.081537e+09   29194.0  5.0  1.472617e+10  1.0
 1                           NaN        ["871076.11000"  0.008080  0.008080  0.0  1.081537e+09  623964.0  5.0  1.472617e+10  1.0
 2                           NaN        ["871073.96500"  0.001000  0.001000  0.0  1.081537e+09   29194.0  5.0  1.472617e+10  1.0
 3                           NaN        ["871042.87000"  0.005000  0.005000  0.0  1.081537e+09   29194.0  5.0  1.472617e+10  1.0
 4                           NaN        ["871038.55000"  0.005000  0.005000  0.0  1.081537e+09   29194.0  5.0  1.472617e+10  1.0
 5                           NaN        ["871032.90156"  0.001000  0.001000  0.0  1.081537e+09   29194.0  5.0  1.472617e+10  1.0

我要输出:

                              0    0.5                  1         2         3    4             5         6    7             8    9
 0       {"offset":"14726172634"  bids:   [["871094.22000"  0.002000  0.002000  0.0  1.081537e+09   29194.0  5.0  1.472617e+10  1.0
 1                           NaN   NaN     ["871076.11000"  0.008080  0.008080  0.0  1.081537e+09  623964.0  5.0  1.472617e+10  1.0
 2                           NaN   NaN     ["871073.96500"  0.001000  0.001000  0.0  1.081537e+09   29194.0  5.0  1.472617e+10  1.0
 3                           NaN   NaN     ["871042.87000"  0.005000  0.005000  0.0  1.081537e+09   29194.0  5.0  1.472617e+10  1.0
 4                           NaN   NaN     ["871038.55000"  0.005000  0.005000  0.0  1.081537e+09   29194.0  5.0  1.472617e+10  1.0
 5                           NaN   NaN     ["871032.90156"  0.001000  0.001000  0.0  1.081537e+09   29194.0  5.0  1.472617e+10  1.0

编辑:我设法找到了以下代码的解决方案:

data = pd.read_csv('20190523-012523_product_5_snapshot_14726172634_14728561053.txt', lineterminator= 
str(']'), low_memory= False, error_bad_lines=False, header= None)#, names= ['a','d','f','r','y','h','n','m','k'])

new = data[1].str.split("[", n = 1, expand = True)

data[1]= new[0]
data[10]= new[1]
data.drop(data.index[-1], inplace=True)

data[10]= new[1].str.strip('[').str.strip('"')

data = data.set_index(1)
data = data.loc[:,[2,10]]

check = data.select_dtypes(include=['float64'])
print(check)


data.to_csv('parser.csv')
print(data)
sys.exit()

0 个答案:

没有答案