我的初始数据框(df):
column1 column2 column3 column4
0 criteria_1 criteria_a 1/5/2017 5
1 criteria_1 criteria_b 2/3/2017 3
2 criteria_1 criteria_a 1/10/2017 10
3 criteria_1 criteria_b 2/7/2017 7
4 criteria_1 criteria_b 2/11/2017 11
5 criteria_1 criteria_a 1/13/2017 13
我的代码:
df = pd.read_csv("C:/Users/Desktop/maxtest.csv")
df['column3'] = pd.to_datetime(df['column3'])
df['max_column3'] = df.groupby(['column1','column2'])['column3'].transform(max)
df['max_column4'] = df.groupby(['column1','column2'])['column4'].transform(max)
df['test'] = np.where(df['column3'] < df['max_column3'],df['column3'],df['max_column4'])
问题:
我创建了一个df [&#39; test&#39;]列,并希望在np.where语句为True时返回df [&#39; column3&#39;]。当我尝试这个时,我会收到一个&#34; TypeError:无效的类型提升&#34;错误。
我不完全确定导致错误的原因。
答案 0 :(得分:0)
请参阅我的评论以获得解释。
df['column3'] = pd.to_datetime(df['column3'])
df['max_column3'] = df.groupby(['column1','column2'])['column3'].transform(max)
df['max_column4'] = df.groupby(['column1','column2'])['column4'].transform(max)
df['test'] = np.where((df['column3'] < df['max_column3']),df.column3.astype(str),df.max_column4)
输出:
column1 column2 column3 column4 max_column3 max_column4 \
0 criteria_1 criteria_a 2017-01-05 5 2017-01-13 13
1 criteria_1 criteria_b 2017-02-03 3 2017-02-11 11
2 criteria_1 criteria_a 2017-01-10 10 2017-01-13 13
3 criteria_1 criteria_b 2017-02-07 7 2017-02-11 11
4 criteria_1 criteria_b 2017-02-11 11 2017-02-11 11
5 criteria_1 criteria_a 2017-01-13 13 2017-01-13 13
test
0 2017-01-05
1 2017-02-03
2 2017-01-10
3 2017-02-07
4 11
5 13
答案 1 :(得分:0)
如果您想保留日期时间格式,可以执行以下操作:
df['test'] = df.apply(lambda x: x.column3 if x.column3 < x.max_column3 else x.max_column4, axis=1)
df
Out[1291]:
column1 column2 column3 column4 max_column3 max_column4 \
0 criteria_1 criteria_a 2017-01-05 5 2017-01-13 13
1 criteria_1 criteria_b 2017-02-03 3 2017-02-11 11
2 criteria_1 criteria_a 2017-01-10 10 2017-01-13 13
3 criteria_1 criteria_b 2017-02-07 7 2017-02-11 11
4 criteria_1 criteria_b 2017-02-11 11 2017-02-11 11
5 criteria_1 criteria_a 2017-01-13 13 2017-01-13 13
test
0 2017-01-05 00:00:00
1 2017-02-03 00:00:00
2 2017-01-10 00:00:00
3 2017-02-07 00:00:00
4 11
5 13
答案 2 :(得分:0)
我最终使用标准功能并执行:
import pandas as pd
import numpy as np
df = pd.read_csv("C:/Users/andre_000/Desktop/maxtest.csv")
df['column3'] = pd.to_datetime(df['column3'])
df['max_column3'] = df.groupby(['column1','column2'])['column3'].transform(max)
df['max_column4'] = df.groupby(['column1','column2'])['column4'].transform(max)
def func(row):
if row['column3'] < row['max_column3']:
return row['column3']
else:
return row['max_column4']
df = df.assign(test=df.apply(func, axis=1))