如何在数据框中为列找到具有相同值(字符串)的两个连续行,并在它们之间添加更多行?数据框具有时间序列索引。
例如:如果连续2行的索引与列A的值相同,则分别为5:30 pm和6:00 pm,我想添加更多行,并在2行之间增加1分钟。下午5:01,下午5:02 ..... 5:59下午。
答案 0 :(得分:0)
这是一种方法:
import pandas as pd
import numpy as np
# say this is your df:
df = pd.DataFrame(index=pd.date_range(periods=6,
start='12:00', end='12:30'))
df['A'] = [1,1,2,3,3,4]
print(df)
# A
#2019-05-09 12:00:00 1
#2019-05-09 12:06:00 1
#2019-05-09 12:12:00 2
#2019-05-09 12:18:00 3
#2019-05-09 12:24:00 3
#2019-05-09 12:30:00 4
# find positions with same value
ends_idx = np.arange(df.shape[0])[
(df['A'].diff() == 0).values]
print(ends_idx)
# [1 4]
# create index with additional time stamps
old_index = df.index
new_index = sorted(np.unique(np.concatenate([
pd.date_range(start=old_index[i-1],
end=old_index[i], freq='min').values
for i in ends_idx
] + [old_index.values])))
# create a new dataframe
new_df = pd.DataFrame(index=new_index)
# assign a default value
new_df['A'] = np.nan
# assign values from old dataframe
new_df.loc[old_index, 'A'] = df['A']
print(new_df)
# A
#2019-05-09 12:00:00 1.0
#2019-05-09 12:01:00 NaN
#2019-05-09 12:02:00 NaN
#2019-05-09 12:03:00 NaN
#2019-05-09 12:04:00 NaN
#2019-05-09 12:05:00 NaN
#2019-05-09 12:06:00 1.0
#2019-05-09 12:12:00 2.0
#2019-05-09 12:18:00 3.0
#2019-05-09 12:19:00 NaN
#2019-05-09 12:20:00 NaN
#2019-05-09 12:21:00 NaN
#2019-05-09 12:22:00 NaN
#2019-05-09 12:23:00 NaN
#2019-05-09 12:24:00 3.0
#2019-05-09 12:30:00 4.0
编辑:对于A中的字符串值,您可以将我们找到位置的部分替换为:
# find positions with same value
n = df.shape[0]
# place holders:
ends_idx = np.arange(n)
same = np.array([False] * n)
# compare values explicitly
same[1:] = df['A'][1:].values == df['A'][:-1].values
ends_idx = ends_idx[same]
答案 1 :(得分:0)
df = pd.DataFrame({'A':[1,1,2,3,3,4]}, index=pd.date_range(periods=6,
start='12:00', end='12:30'))
print(df)
A
2019-05-09 12:00:00 1
2019-05-09 12:06:00 1
2019-05-09 12:12:00 2
2019-05-09 12:18:00 3
2019-05-09 12:24:00 3
2019-05-09 12:30:00 4
df = df.asfreq('min')
print (df)
A
2019-05-09 12:00:00 1.0
2019-05-09 12:01:00 NaN
2019-05-09 12:02:00 NaN
2019-05-09 12:03:00 NaN
2019-05-09 12:04:00 NaN
2019-05-09 12:05:00 NaN
2019-05-09 12:06:00 1.0
2019-05-09 12:07:00 NaN
2019-05-09 12:08:00 NaN
2019-05-09 12:09:00 NaN
2019-05-09 12:10:00 NaN
2019-05-09 12:11:00 NaN
2019-05-09 12:12:00 2.0
2019-05-09 12:13:00 NaN
2019-05-09 12:14:00 NaN
2019-05-09 12:15:00 NaN
2019-05-09 12:16:00 NaN
2019-05-09 12:17:00 NaN
2019-05-09 12:18:00 3.0
2019-05-09 12:19:00 NaN
2019-05-09 12:20:00 NaN
2019-05-09 12:21:00 NaN
2019-05-09 12:22:00 NaN
2019-05-09 12:23:00 NaN
2019-05-09 12:24:00 3.0
2019-05-09 12:25:00 NaN
2019-05-09 12:26:00 NaN
2019-05-09 12:27:00 NaN
2019-05-09 12:28:00 NaN
2019-05-09 12:29:00 NaN
2019-05-09 12:30:00 4.0