我使用过滤器检查数据框中的条件,以便可以对其进行标记。
filtering = (dfsamen.shift(0).moving=='movingToclose') & (more condtions)
dffilter = pd.Dataframe(data=filtering, columns = ['filter'])
dffilter['DateTime'] = dfsamen['DateTime']
输出:
过滤
4 False
5 False
6 True
7 True
dffilter
4 False 2018-06-03 06:33:38.593
5 False 2018-06-03 06:33:39.197
6 True 2018-06-03 06:33:40.597
7 True 2018-06-03 06:33:41.800
但是后来我在不同的条件下使用了相同的代码,但这是行不通的
filtering2 = (dfsamen.shift(0).Input5==1) | (more conditions)
dffilter2 = pd.DataFrame(data=filtering2, columns=['filter2'])
dffilter2['DateTime'] = dfsamen['DateTime']
输出:
filtering2
4 False
5 True
6 True
7 True
dffilter2(在添加日期时间之前)
Empty DataFrame
Columns: [filter2]
Index: []
dffilter2(带有日期时间)
4 NaN 2018-06-03 06:33:38.593
5 NaN 2018-06-03 06:33:39.197
6 NaN 2018-06-03 06:33:40.597
7 NaN 2018-06-03 06:33:41.800
那么,即使我将数据添加到列中,但为什么filtering2
中存在数据,为什么我的数据也不会消失在第二个过滤器中?
答案 0 :(得分:1)
问题是您的DataFrame
构造函数,因为它是默认创建的RangeIndex
,因此两个DataFrame中可能存在不同的索引,数据不对齐,并且对于具有不同索引值的行,您会获得NaNs列。
解决方案正在将值转换为numpy数组:
filtering = (dfsamen.shift(0).moving=='movingToclose') & (more condtions)
dffilter = pd.DataFrame(data=filtering.values, columns = ['filter'])
dffilter['DateTime'] = dfsamen['DateTime'].values
print (dffilter)
示例:
dfsamen = pd.DataFrame({
'A':list('abc'),
'DateTime':pd.date_range('2015-01-01', periods=3),
'C':[7,8,9]
}, index=[4,5,6])
print (dfsamen)
A DateTime C
4 a 2015-01-01 7
5 b 2015-01-02 8
6 c 2015-01-03 9
filtering = dfsamen.A == 'a'
dffilter = pd.DataFrame(data=filtering.values, columns = ['filter'])
dffilter['DateTime'] = dfsamen['DateTime'].values
print (dffilter)
filter DateTime
0 True 2015-01-01
1 False 2015-01-02
2 False 2015-01-03
或使用Series.to_frame
将Series
转换为具有一列的DataFrame:
filtering = dfsamen.A == 'a'
dffilter = filtering.to_frame('filter')
dffilter['DateTime'] = dfsamen['DateTime'].values
print (dffilter)
filter DateTime
4 True 2015-01-01
5 False 2015-01-02
6 False 2015-01-03