Question

我有一个来自excel（Data Frame）的pd.read_excel()（df），我需要定义一个具有不同类型的新列，例如：

df['new col'] = df['Date1']
df.loc[condition('Date1'), 'new col'] = 'string'

其中df['Date1']是“日期”列，而condition('Date1')检查'Date1'是否在给定的值范围内，而string是固定文本。我的代码给出了错误。如何定义新列？

on condition（）函数让：

a = df4['Date2'] - pd.Timedelta(2, unit='d')
b = df4['Date2'] + pd.Timedelta(2, unit='d')

condition('Date1')= df['Date1'].between(a, b, inclusive=False)

Answer 1

您只需要简化代码-从df4['Date1'].between(a, b, inclusive=False)返回布尔系列，因此只需要传递给loc：

a = df4['Date2'] - pd.Timedelta(2, unit='d')
b = df4['Date2'] + pd.Timedelta(2, unit='d')

mask = df4['Date1'].between(a, b, inclusive=False)

df4['new col'] = df4['Date1']
df4.loc[mask, 'new col'] = 'string'

使用numpy.where更好的选择：

df4['new col'] = np.where(mask, 'string', df4['Date1'])

通知：

可以在同一系列中混合使用值，但随后应降低性能并破坏某些功能，因此请小心。

在熊猫数据框中定义具有不同类型的列

1 个答案: