转换Pandas列的数据类型

时间:2015-08-09 02:24:39

标签: python python-2.7 pandas

如果Pandas DataFrame列timestampMs中的值属于unicode类型,并且我们想将其转换为float,那么以下两种方法之间是否存在差异?< / p>

df['timestampMs'].map(lambda x: float(x)/1000)

df['timestampMs'].astype('float')/1000

因为他们似乎都给出了相同的结果,这是首选的方法吗?

1 个答案:

答案 0 :(得分:2)

嗯......如果你关心速度,对于小型数据集,lambda方法要快一点。对于大型数据集,请使用.astype()方法(我个人觉得它更具可读性):

import time
import timeit
import pandas as pd

num_elements = 100
times = [unicode(time.clock()) for x in range(num_elements)]

df = pd.DataFrame(times)

def first_method():
    df[0].map(lambda x: float(x)/1000)

def second_method():
    df[0].astype('float')/1000

num_reps = 15000

print("First method time for {} reps: {}".format(num_reps, timeit.timeit(first_method, number=num_reps)))
print("Second method time for {} reps: {}".format(num_reps, timeit.timeit(second_method, number=num_reps)))

我得到num_elements = 100时:

First method time for 15000 reps: 1.95685731342
Second method time for 15000 reps: 2.22381265566

我得到num_elements = 1000时:

First method time for 15000 reps: 12.0774245498
Second method time for 15000 reps: 6.77670391568