熊猫数据框中的拆分列

时间:2018-12-05 12:16:10

标签: python pandas

我想使用逗号定界符将ji中的df列分为两列-消除ji值周围的括号也将是一件好事。我尝试了各种方法,并不断出错。我现在想避免使用lambda expression!还有其他想法吗?

示例

      ji           length
0     (75.0, 5.0)  3283.458479
1     (96.0, 5.0)  1431.312901
2     (97.0, 5.0)  1364.592959
3    (247.0, 5.0)  3736.322308
4     (81.0, 7.0)  2655.910005
5     (93.0, 7.0)  1752.293687
6    (242.0, 7.0)   427.844417
7    (248.0, 7.0)  3725.823013
8    (254.0, 7.0)  2318.937332
9    (255.0, 7.0)  2292.673905
10   (242.0, 8.0)   145.811907
11   (254.0, 8.0)  2222.447786
12   (255.0, 8.0)  2196.184360
13   (248.0, 9.0)   441.222866
14   (253.0, 9.0)   853.095032
15   (256.0, 9.0)  2076.942682
16   (91.0, 10.0)  1743.310744
17   (93.0, 10.0)  1256.337420
18  (105.0, 10.0)   523.447658
19  (174.0, 10.0)  1530.617012
20  (176.0, 10.0)  1697.614009
21  (248.0, 10.0)   440.000463
22  (253.0, 10.0)   904.706003
23  (256.0, 10.0)  1991.662604
24  (258.0, 10.0)  1850.995862
25  (172.0, 11.0)  1301.179960
26  (174.0, 11.0)  1436.984094
27  (176.0, 11.0)  1695.954099
28  (179.0, 11.0)  1548.015013
29  (228.0, 11.0)  4640.928585
30  (242.0, 11.0)   169.617203
31  (251.0, 11.0)   784.921333
32  (253.0, 11.0)   983.118859
33  (255.0, 11.0)  1181.474433
34  (256.0, 11.0)  1303.398235

3 个答案:

答案 0 :(得分:4)

如果ji-pop列中的字符串用于提取,stripsplitexpand=True一起用于DataFrame的解决方案:

print (type(df.loc[0, 'ji']))
<class 'str'>

df[['a','b']] = df.pop('ji').str.strip('()').str.split(', ', expand=True).astype(float)

或者如果没有缺失值和性能很重要,请使用list comprehension

L = [x.strip('()').split(', ') for x in df.pop('ji')]
df[['a','b']] = pd.DataFrame(L, index=df.index).astype(float)

print (df)
         length      a     b
0   3283.458479   75.0   5.0
1   1431.312901   96.0   5.0
2   1364.592959   97.0   5.0
3   3736.322308  247.0   5.0
4   2655.910005   81.0   7.0
5   1752.293687   93.0   7.0
6    427.844417  242.0   7.0
7   3725.823013  248.0   7.0

如果是元组,则创建元组的嵌套列表并传递给DataFrame构造函数:

print (type(df.loc[0, 'ji']))
<class 'tuple'>

df[['a','b']] = pd.DataFrame(df.pop('ji').values.tolist(), index=df.index)

答案 1 :(得分:2)

编辑:

如果'ji'包含元组,则要简单得多:

df[['j', 'i']] = df.pop('ji').apply(pd.Series)

给予

>>> df                                                                            
            ji       length
0   (75.0,5.0)  3283.458479
1   (96.0,5.0)  1431.312901
2   (97.0,5.0)  1364.592959
3  (247.0,5.0)  3736.322308
4   (81.0,7.0)  2655.910005
>>>
>>> df.dtypes                                                                     
ji         object
length    float64
dtype: object

即当'ji'列包含字符串时,我将在此处使用ast.literal_eval

>>> from ast import literal_eval
>>> def split_to_df(string): 
...:    return pd.Series(literal_eval(string)) 
>>>
>>> df[['val1', 'val2']] = df.pop('ji').apply(split_to_df)                                                                                                      
>>> df                                                                                                                                                   
        length   val1  val2
0  3283.458479   75.0   5.0
1  1431.312901   96.0   5.0
2  1364.592959   97.0   5.0
3  3736.322308  247.0   5.0
4  2655.910005   81.0   7.0

(受jezrael的回答启发,pop的用法。)

答案 2 :(得分:1)

您需要:

df['a'] = df['ji'].apply(lambda x: x[0])
df['b'] = df['ji'].apply(lambda x: x[1])

df.drop(['ji'], axis=1, inplace=True)