如何在pandas中将列拆分为两个

时间:2017-01-31 09:45:38

标签: python pandas

我有一个像这样的数据框

 date
2015-04-18 06:00:05     10  7260.0      NaN     NaN     000000008000
2015-04-18 06:00:11     10  7260.0      NaN     NaN     000000008000
2015-04-18 06:00:17     10  7260.0      NaN     NaN     000000008000
2015-04-18 06:00:23     10  12270.0     NaN     NaN     000000000000
2015-04-18 06:00:30     10  11610.0     NaN     NaN     000000000000
2015-04-18 06:00:36     10  11580.0     NaN     NaN     000000000000

现在我要拆分第二列; 预期的产出如下......

2015-04-18 06:00:05     1 0     7260.0      NaN     NaN     000000008000
2015-04-18 06:00:11     1 0     7260.0      NaN     NaN     000000008000
2015-04-18 06:00:17     1 0     7260.0      NaN     NaN     000000008000
2015-04-18 06:00:23     1 0     12270.0     NaN     NaN     000000000000
2015-04-18 06:00:30     1 0     11610.0     NaN     NaN     000000000000
2015-04-18 06:00:36     1 0     11580.0     NaN     NaN     000000000000

我使用以下代码阅读了数据框;

 from pandas import DataFrame
 import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt     
 df = pd.read_csv('data.txt', sep=',',parse_dates{'Date':[0]},index_col='Date', header=None,keep_default_na=False,  na_values='-9999')

现在我将如何拆分第二列?

1 个答案:

答案 0 :(得分:0)

我认为你需要ndexing with str

print (df)
                      1        2   3   4     5
date                                          
2015-04-18 06:00:05  10   7260.0 NaN NaN  8000
2015-04-18 06:00:11  10   7260.0 NaN NaN  8000
2015-04-18 06:00:17  10   7260.0 NaN NaN  8000
2015-04-18 06:00:23  10  12270.0 NaN NaN     0
2015-04-18 06:00:30  10  11610.0 NaN NaN     0
2015-04-18 06:00:36  10  11580.0 NaN NaN     0


df['a1'], df['a2'] = df[1].astype(str).str[0], df[1].astype(str).str[1]
#remove column 1
df = df.drop(1, axis=1)
#reorder columns
df = df[['a1','a2'] + df.columns[:-2].tolist()]
print (df)
                    a1 a2        2   3   4     5
date                                            
2015-04-18 06:00:05  1  0   7260.0 NaN NaN  8000
2015-04-18 06:00:11  1  0   7260.0 NaN NaN  8000
2015-04-18 06:00:17  1  0   7260.0 NaN NaN  8000
2015-04-18 06:00:23  1  0  12270.0 NaN NaN     0
2015-04-18 06:00:30  1  0  11610.0 NaN NaN     0
2015-04-18 06:00:36  1  0  11580.0 NaN NaN     0

DataFrame.insert的另一个解决方案:

ser = df[1].astype(str)
df.insert(0, 'a1',ser.str[0])
df.insert(1, 'a2',ser.str[1])
df = df.drop(1, axis=1)
#for default column names reasign
df.columns = np.arange(len(df.columns))
print (df)
                     0  1        2   3   4     5
date                                            
2015-04-18 06:00:05  1  0   7260.0 NaN NaN  8000
2015-04-18 06:00:11  1  0   7260.0 NaN NaN  8000
2015-04-18 06:00:17  1  0   7260.0 NaN NaN  8000
2015-04-18 06:00:23  1  0  12270.0 NaN NaN     0
2015-04-18 06:00:30  1  0  11610.0 NaN NaN     0
2015-04-18 06:00:36  1  0  11580.0 NaN NaN     0