我想使用逗号定界符将ji
中的df
列分为两列-消除ji
值周围的括号也将是一件好事。我尝试了各种方法,并不断出错。我现在想避免使用lambda expression
!还有其他想法吗?
示例
ji length
0 (75.0, 5.0) 3283.458479
1 (96.0, 5.0) 1431.312901
2 (97.0, 5.0) 1364.592959
3 (247.0, 5.0) 3736.322308
4 (81.0, 7.0) 2655.910005
5 (93.0, 7.0) 1752.293687
6 (242.0, 7.0) 427.844417
7 (248.0, 7.0) 3725.823013
8 (254.0, 7.0) 2318.937332
9 (255.0, 7.0) 2292.673905
10 (242.0, 8.0) 145.811907
11 (254.0, 8.0) 2222.447786
12 (255.0, 8.0) 2196.184360
13 (248.0, 9.0) 441.222866
14 (253.0, 9.0) 853.095032
15 (256.0, 9.0) 2076.942682
16 (91.0, 10.0) 1743.310744
17 (93.0, 10.0) 1256.337420
18 (105.0, 10.0) 523.447658
19 (174.0, 10.0) 1530.617012
20 (176.0, 10.0) 1697.614009
21 (248.0, 10.0) 440.000463
22 (253.0, 10.0) 904.706003
23 (256.0, 10.0) 1991.662604
24 (258.0, 10.0) 1850.995862
25 (172.0, 11.0) 1301.179960
26 (174.0, 11.0) 1436.984094
27 (176.0, 11.0) 1695.954099
28 (179.0, 11.0) 1548.015013
29 (228.0, 11.0) 4640.928585
30 (242.0, 11.0) 169.617203
31 (251.0, 11.0) 784.921333
32 (253.0, 11.0) 983.118859
33 (255.0, 11.0) 1181.474433
34 (256.0, 11.0) 1303.398235
答案 0 :(得分:4)
如果ji
-pop
列中的字符串用于提取,strip
和split
与expand=True
一起用于DataFrame
的解决方案:
print (type(df.loc[0, 'ji']))
<class 'str'>
df[['a','b']] = df.pop('ji').str.strip('()').str.split(', ', expand=True).astype(float)
或者如果没有缺失值和性能很重要,请使用list comprehension
:
L = [x.strip('()').split(', ') for x in df.pop('ji')]
df[['a','b']] = pd.DataFrame(L, index=df.index).astype(float)
print (df)
length a b
0 3283.458479 75.0 5.0
1 1431.312901 96.0 5.0
2 1364.592959 97.0 5.0
3 3736.322308 247.0 5.0
4 2655.910005 81.0 7.0
5 1752.293687 93.0 7.0
6 427.844417 242.0 7.0
7 3725.823013 248.0 7.0
如果是元组,则创建元组的嵌套列表并传递给DataFrame
构造函数:
print (type(df.loc[0, 'ji']))
<class 'tuple'>
df[['a','b']] = pd.DataFrame(df.pop('ji').values.tolist(), index=df.index)
答案 1 :(得分:2)
编辑:
如果'ji'
包含元组,则要简单得多:
df[['j', 'i']] = df.pop('ji').apply(pd.Series)
给予
>>> df
ji length
0 (75.0,5.0) 3283.458479
1 (96.0,5.0) 1431.312901
2 (97.0,5.0) 1364.592959
3 (247.0,5.0) 3736.322308
4 (81.0,7.0) 2655.910005
>>>
>>> df.dtypes
ji object
length float64
dtype: object
即当'ji'
列包含字符串时,我将在此处使用ast.literal_eval
。
>>> from ast import literal_eval
>>> def split_to_df(string):
...: return pd.Series(literal_eval(string))
>>>
>>> df[['val1', 'val2']] = df.pop('ji').apply(split_to_df)
>>> df
length val1 val2
0 3283.458479 75.0 5.0
1 1431.312901 96.0 5.0
2 1364.592959 97.0 5.0
3 3736.322308 247.0 5.0
4 2655.910005 81.0 7.0
(受jezrael的回答启发,pop
的用法。)
答案 2 :(得分:1)
您需要:
df['a'] = df['ji'].apply(lambda x: x[0])
df['b'] = df['ji'].apply(lambda x: x[1])
df.drop(['ji'], axis=1, inplace=True)