我使用pandas.DataFrame.apply
函数遇到了问题。
似乎将所有值都投放到bool
,除非我通过添加新列来“触摸”DataFrame
。无论我是使用基于行还是基于列的apply
(即axis=0
或axis=1
),都会发生这种情况。
我的直觉告诉我,我在这里做了一些非常错误的事情,但我无法理解问题所在。
from datetime import datetime, timedelta
import pandas as pd
start_date = datetime(2014, 1, 1)
end_date = datetime(2014, 1, 3)
events = pd.DataFrame({
"some_boolean_field": True,
"timestamp": pd.date_range(start_date, end_date, freq='D')
})
def do_stuff(event):
print event
print ""
def run_experiment(message, df):
print message
print "**********************************"
print df
print df.dtypes
df.apply(do_stuff, axis=1)
print "\n\n"
run_experiment("BEFORE ADDING EXTRA FIELD", events)
events['foo'] = "WTF" # Insane hack to get pandas to pass the correct row dtypes when applying the `do_stuff` function.
run_experiment("AFTER ADDING EXTRA FIELD", events)
输出:
BEFORE ADDING EXTRA FIELD
**********************************
some_boolean_field timestamp
0 True 2014-01-01
1 True 2014-01-02
2 True 2014-01-03
some_boolean_field bool
timestamp datetime64[ns]
dtype: object
some_boolean_field True
timestamp True
Name: 0, dtype: bool
some_boolean_field True
timestamp True
Name: 1, dtype: bool
some_boolean_field True
timestamp True
Name: 2, dtype: bool
AFTER ADDING EXTRA FIELD
**********************************
some_boolean_field timestamp foo
0 True 2014-01-01 WTF
1 True 2014-01-02 WTF
2 True 2014-01-03 WTF
some_boolean_field bool
timestamp datetime64[ns]
foo object
dtype: object
some_boolean_field True
timestamp 2014-01-01 00:00:00
foo WTF
Name: 0, dtype: object
some_boolean_field True
timestamp 2014-01-02 00:00:00
foo WTF
Name: 1, dtype: object
some_boolean_field True
timestamp 2014-01-03 00:00:00
foo WTF
Name: 2, dtype: object