有一个数据框。 期间 列包含列表。这些列表包含时间跨度。
#load data
df = pd.DataFrame(data, columns=['task_id', 'target_start_date', 'target_end_date'])
df['target_start_date'] = pd.to_datetime(df.target_start_date)
df['target_end_date'] = pd.to_datetime(df.target_end_date)
df['period'] = np.nan
#create period column
z = dict()
freq = 'M'
for i in range(0, len(df)):
l = pd.period_range(df.target_start_date[i], df.target_end_date[i], freq=freq)
l = l.to_native_types()
z[i] = l
df.period = z.values()
输出
task_id target_start_date target_end_date period
0 35851 2019-04-01 07:00:00 2019-04-01 07:00:00 [2019-04]
1 35852 2020-02-26 11:30:00 2020-02-26 11:30:00 [2020-02]
2 35854 2019-05-17 07:00:00 2019-06-01 17:30:00 [2019-05, 2019-06]
3 35855 2019-03-20 11:30:00 2019-04-07 15:00:00 [2019-03, 2019-04]
4 35856 2019-04-06 08:00:00 2019-04-26 19:00:00 [2019-04]
然后我添加称为时间片的列。
#create slices
date_min = df.target_start_date.min()
date_max = df.target_end_date.max()
period = pd.period_range(date_min, date_max, freq=freq)
#add columns
for i in period:
df[str(i)] = np.nan
结果
如果此值在期间列的列表中,如何为True填充Nan值?
答案 0 :(得分:1)
在数据框行上应用函数
def fillit(row):
for i in row.period:
row[i] = True
df.apply(fillit), axis=1)
答案 1 :(得分:0)
我的方法是遍历行和列名并比较值:
import numpy as np
import pandas as pd
# handle assignment error
pd.options.mode.chained_assignment = None
# setup test data
data = {'time': [['2019-04'], ['2019-01'], ['2019-03'], ['2019-06', '2019-05']]}
data = pd.DataFrame(data=data)
# create periods
date_min = data.time.min()[0]
date_max = data.time.max()[0]
period = pd.period_range(date_min, date_max, freq='M')
for i in period:
data[str(i)] = np.nan
# compare and fill data
for index, row in data.iterrows():
for column in data:
if data[column].name in row['time']:
data[column][index] = 'True'
输出:
time 2019-01 2019-02 2019-03 2019-04 2019-05 2019-06
0 [2019-04] NaN NaN NaN True NaN NaN
1 [2019-01] True NaN NaN NaN NaN NaN
2 [2019-03] NaN NaN True NaN NaN NaN
3 [2019-06, 2019-05] NaN NaN NaN NaN True True