我有一个具有时间序列的数据帧,其中一列包含字符串:Normal Value
和Wrong Value
。我想用Wrong Value
查找所有行之间的所有行,并将它们0分配给新列。具有Normal Value
且不在具有Wrong Value
的行之间的行应具有值1。Value
列表示时间序列中的高峰值。
示例数据框:
>>> df = pd.DataFrame({'Date': ['2019-01-01','2019-01-02','2019-01-03','2019-01-04','2019-01-05','2019-01-06','2019-01-07','2019-01-08','2019-01-09', '2019-01-10'],
... 'Value': [-0.011295, -0.013431, 580944.426061, 0.000000, 0.000000, -0.999998, 0.000000, 0.000000, 712327.147257, -0.999999],
... 'String': ['Normal Value', 'Normal Value', 'Wrong Value', 'Normal Value', 'Normal Value', 'Wrong Value', 'Normal Value', 'Normal Value', 'Wrong Value', 'Wrong Value']})
>>> df
Date Value String
0 2019-01-01 -0.011295 Normal Value
1 2019-01-02 -0.013431 Normal Value
2 2019-01-03 580944.426061 Wrong Value
3 2019-01-04 0.000000 Normal Value
4 2019-01-05 0.000000 Normal Value
5 2019-01-06 -0.999998 Wrong Value
6 2019-01-07 0.000000 Normal Value
7 2019-01-08 0.000000 Normal Value
8 2019-01-09 712327.147257 Wrong Value
9 2019-01-10 -0.999999 Wrong Value
预期输出:
>>> df = pd.DataFrame({'Date': ['2019-01-01','2019-01-02','2019-01-03','2019-01-04','2019-01-05','2019-01-06','2019-01-07','2019-01-08','2019-01-09', '2019-01-10'],
... 'Value': [-0.011295, -0.013431, 580944.426061, 0.000000, 0.000000, -0.999998, 0.000000, 0.000000, 712327.147257, -0.999999],
... 'String': ['Normal Value', 'Normal Value', 'Wrong Value', 'Normal Value', 'Normal Value', 'Wrong Value', 'Normal Value', 'Normal Value', 'Wrong Value', 'Wrong Value'],
... 'Expected Value': [1, 1, 0, 0, 0, 0, 1, 1, 0, 0]})
>>> df
Date Value String Expected Value
0 2019-01-01 -0.011295 Normal Value 1
1 2019-01-02 -0.013431 Normal Value 1
2 2019-01-03 580944.426061 Wrong Value 0
3 2019-01-04 0.000000 Normal Value 0
4 2019-01-05 0.000000 Normal Value 0
5 2019-01-06 -0.999998 Wrong Value 0
6 2019-01-07 0.000000 Normal Value 1
7 2019-01-08 0.000000 Normal Value 1
8 2019-01-09 712327.147257 Wrong Value 0
9 2019-01-10 -0.999999 Wrong Value 0
答案 0 :(得分:0)
基本上,您想要的是转换此列表
[1,1,0,1,1,0,1,1,0,0,...]
(1为正常值,0为错误值)
至:
[1,1,0,0,0,0,1,1,0,0,...]
一个简单的for循环可以完成这项工作:
a = []
is_wrong = 0
for value in df['String'].values:
if is_wrong == 0:
if value == 'Normal Value':
a.append(1)
else:
a.append(0)
is_wrong = 1
else:
if value == 'Normal Value':
a.append(0)
else:
a.append(0)
is_wrong = 0
df['Expected Value'] = a
(也许)更优雅的方式可能是:
a = []
is_wrong = False # store the current state
for value in df['String'].map({'Normal Value':True,'Wrong Value':False}).values:
a.append(value and not is_wrong) # check the current state and output value
is_wrong = is_wrong if value else not is_wrong # change the state if needed
df['Expected Value'] = [int(x) for x in a]
在两种情况下:
df['Expected Value'] = [1, 1, 0, 0, 0, 0, 1, 1, 0, 0]
答案 1 :(得分:0)
在相邻行上运行的熊猫函数很少: Series.diff
或Series.pct_change
或DataFrame.shift
,但我基本上会通过循环和if子句(或状态之间或不之间的琐碎状态机)解决这一问题。
import pandas as pd
df = pd.DataFrame({'Date': ['2019-01-01','2019-01-02','2019-01-03','2019-01-04','2019-01-05','2019-01-06','2019-01-07','2019-01-08','2019-01-09', '2019-01-10'],
'Value': [-0.011295, -0.013431, 580944.426061, 0.000000, 0.000000, -0.999998, 0.000000, 0.000000, 712327.147257, -0.999999],
'String': ['Normal Value', 'Normal Value', 'Wrong Value', 'Normal Value', 'Normal Value', 'Wrong Value', 'Normal Value', 'Normal Value', 'Wrong Value', 'Wrong Value']})
state_machine = {(0,"Normal Value"): (0,1),
(0,"Wrong Value") : (1,0),
(1,"Normal Value"): (1,0),
(1,"Wrong Value") : (0,0),
}
state=0
expected_values = []
for s in df['String']:
state, expected = state_machine[state,s]
expected_values.append(expected)
df['Expected Value'] = expected_values
print(df)
Date Value String Expected Value
0 2019-01-01 -0.011295 Normal Value 1
1 2019-01-02 -0.013431 Normal Value 1
2 2019-01-03 580944.426061 Wrong Value 0
3 2019-01-04 0.000000 Normal Value 0
4 2019-01-05 0.000000 Normal Value 0
5 2019-01-06 -0.999998 Wrong Value 0
6 2019-01-07 0.000000 Normal Value 1
7 2019-01-08 0.000000 Normal Value 1
8 2019-01-09 712327.147257 Wrong Value 0
9 2019-01-10 -0.999999 Wrong Value 0
答案 2 :(得分:0)
Python代码:
import pandas as pd
def condition(x):
if x == 'Wrong Value':
return 0
return 1
dict = {'Date': ['2019-01-01','2019-01-02','2019-01-03','2019-01-04','2019-01-05','2019-01-06','2019-01-07','2019-01-08','2019-01-09', '2019-01-10'],
'Value': [-0.011295, -0.013431, 580944.426061, 0.000000, 0.000000, -0.999998, 0.000000, 0.000000, 712327.147257, -0.999999],
'String': ['Normal Value', 'Normal Value', 'Wrong Value', 'Normal Value', 'Normal Value', 'Wrong Value', 'Normal Value', 'Normal Value', 'Wrong Value', 'Wrong Value']
}
df = pd.DataFrame(dict)
new_df = df['String'].apply(condition)
idx = df.index[new_df < 1]
for i in range(0,len(idx),2):
if idx[i+1] - idx[i] > 1:
new_df.loc[idx[i]:idx[i+1]] = 0
df['Expected Value'] = new_df
print (df)
输出:
Date Value String Expected Value
0 2019-01-01 -0.011295 Normal Value 1
1 2019-01-02 -0.013431 Normal Value 1
2 2019-01-03 580944.426061 Wrong Value 0
3 2019-01-04 0.000000 Normal Value 0
4 2019-01-05 0.000000 Normal Value 0
5 2019-01-06 -0.999998 Wrong Value 0
6 2019-01-07 0.000000 Normal Value 1
7 2019-01-08 0.000000 Normal Value 1
8 2019-01-09 712327.147257 Wrong Value 0
9 2019-01-10 -0.999999 Wrong Value 0