我想按如下方式将数据框的列分为两列。查看列0.5
,这是我要添加的内容。我当前的代码不起作用。
样本数据:
{"offset":"14726172634","bids":[["871094.22000","0.00200000","0.00200000","0","1081537351","29194","5","14726172633","1"],["871076.11000","0.00808000","0.00808000","0","1081537130","623964","5","14726172043","1"],["871073.96500","0.00100000","0.00100000","0","1081537185","29194","5","14726172231","1"],["871042.87000","0.00500000","0.00500000","0","1081536781","29194","5","14726171235","1"],["871038.55000","0.00500000","0.00500000","0","1081537169","29194","5","14726172161","1"],["871032.90156","0.00100000","0.00100000","0","1081537343","29194","5","14726172616","1"],
代码:
data = pd.read_csv('20190523-012523_product_5_snapshot_14726172634_14728561053.txt', lineterminator= str(']'), low_memory= False, error_bad_lines=False, header= None)
new = data[1].str.split(":", n = 1, expand = True)
data.to_csv('parser.csv')
print(data)
sys.exit()
当前输出:
0 1 2 3 4 5 6 7 8 9
0 {"offset":"14726172634" bids:[["871094.22000" 0.002000 0.002000 0.0 1.081537e+09 29194.0 5.0 1.472617e+10 1.0
1 NaN ["871076.11000" 0.008080 0.008080 0.0 1.081537e+09 623964.0 5.0 1.472617e+10 1.0
2 NaN ["871073.96500" 0.001000 0.001000 0.0 1.081537e+09 29194.0 5.0 1.472617e+10 1.0
3 NaN ["871042.87000" 0.005000 0.005000 0.0 1.081537e+09 29194.0 5.0 1.472617e+10 1.0
4 NaN ["871038.55000" 0.005000 0.005000 0.0 1.081537e+09 29194.0 5.0 1.472617e+10 1.0
5 NaN ["871032.90156" 0.001000 0.001000 0.0 1.081537e+09 29194.0 5.0 1.472617e+10 1.0
我要输出:
0 0.5 1 2 3 4 5 6 7 8 9
0 {"offset":"14726172634" bids: [["871094.22000" 0.002000 0.002000 0.0 1.081537e+09 29194.0 5.0 1.472617e+10 1.0
1 NaN NaN ["871076.11000" 0.008080 0.008080 0.0 1.081537e+09 623964.0 5.0 1.472617e+10 1.0
2 NaN NaN ["871073.96500" 0.001000 0.001000 0.0 1.081537e+09 29194.0 5.0 1.472617e+10 1.0
3 NaN NaN ["871042.87000" 0.005000 0.005000 0.0 1.081537e+09 29194.0 5.0 1.472617e+10 1.0
4 NaN NaN ["871038.55000" 0.005000 0.005000 0.0 1.081537e+09 29194.0 5.0 1.472617e+10 1.0
5 NaN NaN ["871032.90156" 0.001000 0.001000 0.0 1.081537e+09 29194.0 5.0 1.472617e+10 1.0
编辑:我设法找到了以下代码的解决方案:
data = pd.read_csv('20190523-012523_product_5_snapshot_14726172634_14728561053.txt', lineterminator=
str(']'), low_memory= False, error_bad_lines=False, header= None)#, names= ['a','d','f','r','y','h','n','m','k'])
new = data[1].str.split("[", n = 1, expand = True)
data[1]= new[0]
data[10]= new[1]
data.drop(data.index[-1], inplace=True)
data[10]= new[1].str.strip('[').str.strip('"')
data = data.set_index(1)
data = data.loc[:,[2,10]]
check = data.select_dtypes(include=['float64'])
print(check)
data.to_csv('parser.csv')
print(data)
sys.exit()