如何在数据帧的列中为具有特定字符串的两行之间的行分配值?

时间:2019-10-15 09:43:12

标签: python pandas dataframe

我有一个具有时间序列的数据帧,其中一列包含字符串:Normal ValueWrong Value。我想用Wrong Value查找所有行之间的所有行,并将它们0分配给新列。具有Normal Value且不在具有Wrong Value的行之间的行应具有值1。Value列表示时间序列中的高峰值。

示例数据框:

>>> df = pd.DataFrame({'Date': ['2019-01-01','2019-01-02','2019-01-03','2019-01-04','2019-01-05','2019-01-06','2019-01-07','2019-01-08','2019-01-09', '2019-01-10'],
...                    'Value': [-0.011295, -0.013431, 580944.426061, 0.000000, 0.000000, -0.999998, 0.000000, 0.000000, 712327.147257, -0.999999],
...                    'String': ['Normal Value', 'Normal Value', 'Wrong Value', 'Normal Value', 'Normal Value', 'Wrong Value', 'Normal Value', 'Normal Value', 'Wrong Value', 'Wrong Value']})
>>> df
         Date          Value        String
0  2019-01-01      -0.011295  Normal Value
1  2019-01-02      -0.013431  Normal Value
2  2019-01-03  580944.426061   Wrong Value
3  2019-01-04       0.000000  Normal Value
4  2019-01-05       0.000000  Normal Value
5  2019-01-06      -0.999998   Wrong Value
6  2019-01-07       0.000000  Normal Value
7  2019-01-08       0.000000  Normal Value
8  2019-01-09  712327.147257   Wrong Value
9  2019-01-10      -0.999999   Wrong Value

预期输出:

>>> df = pd.DataFrame({'Date': ['2019-01-01','2019-01-02','2019-01-03','2019-01-04','2019-01-05','2019-01-06','2019-01-07','2019-01-08','2019-01-09', '2019-01-10'],
...                    'Value': [-0.011295, -0.013431, 580944.426061, 0.000000, 0.000000, -0.999998, 0.000000, 0.000000, 712327.147257, -0.999999],
...                    'String': ['Normal Value', 'Normal Value', 'Wrong Value', 'Normal Value', 'Normal Value', 'Wrong Value', 'Normal Value', 'Normal Value', 'Wrong Value', 'Wrong Value'],
...                    'Expected Value': [1, 1, 0, 0, 0, 0, 1, 1, 0, 0]})
>>> df
         Date          Value        String  Expected Value
0  2019-01-01      -0.011295  Normal Value               1
1  2019-01-02      -0.013431  Normal Value               1
2  2019-01-03  580944.426061   Wrong Value               0
3  2019-01-04       0.000000  Normal Value               0
4  2019-01-05       0.000000  Normal Value               0
5  2019-01-06      -0.999998   Wrong Value               0
6  2019-01-07       0.000000  Normal Value               1
7  2019-01-08       0.000000  Normal Value               1
8  2019-01-09  712327.147257   Wrong Value               0
9  2019-01-10      -0.999999   Wrong Value               0

3 个答案:

答案 0 :(得分:0)

基本上,您想要的是转换此列表 [1,1,0,1,1,0,1,1,0,0,...](1为正常值,0为错误值) 至: [1,1,0,0,0,0,1,1,0,0,...]

一个简单的for循环可以完成这项工作:

a = []
is_wrong = 0
for value in df['String'].values:
    if is_wrong == 0:
        if value == 'Normal Value':
            a.append(1)
        else:
            a.append(0)
            is_wrong = 1
    else:
        if value == 'Normal Value':
            a.append(0)
        else:
            a.append(0)
            is_wrong = 0
df['Expected Value'] = a

(也许)更优雅的方式可能是:

a = []
is_wrong = False # store the current state
for value in df['String'].map({'Normal Value':True,'Wrong Value':False}).values:
    a.append(value and not is_wrong) # check the current state and output value
    is_wrong = is_wrong if value else not is_wrong # change the state if needed
df['Expected Value'] = [int(x) for x in a]

在两种情况下:

df['Expected Value'] = [1, 1, 0, 0, 0, 0, 1, 1, 0, 0]

答案 1 :(得分:0)

在相邻行上运行的熊猫函数很少: Series.diffSeries.pct_changeDataFrame.shift,但我基本上会通过循环和if子句(或状态之间或不之间的琐碎状态机)解决这一问题。

import pandas as pd

df = pd.DataFrame({'Date': ['2019-01-01','2019-01-02','2019-01-03','2019-01-04','2019-01-05','2019-01-06','2019-01-07','2019-01-08','2019-01-09', '2019-01-10'],
                   'Value': [-0.011295, -0.013431, 580944.426061, 0.000000, 0.000000, -0.999998, 0.000000, 0.000000, 712327.147257, -0.999999],
                   'String': ['Normal Value', 'Normal Value', 'Wrong Value', 'Normal Value', 'Normal Value', 'Wrong Value', 'Normal Value', 'Normal Value', 'Wrong Value', 'Wrong Value']})

state_machine = {(0,"Normal Value"): (0,1),
                 (0,"Wrong Value") : (1,0),
                 (1,"Normal Value"): (1,0),
                 (1,"Wrong Value") : (0,0),
                }
state=0
expected_values = []
for s in df['String']:
    state, expected = state_machine[state,s]
    expected_values.append(expected)
df['Expected Value'] = expected_values

print(df)

 Date          Value        String  Expected Value
0  2019-01-01      -0.011295  Normal Value               1
1  2019-01-02      -0.013431  Normal Value               1
2  2019-01-03  580944.426061   Wrong Value               0
3  2019-01-04       0.000000  Normal Value               0
4  2019-01-05       0.000000  Normal Value               0
5  2019-01-06      -0.999998   Wrong Value               0
6  2019-01-07       0.000000  Normal Value               1
7  2019-01-08       0.000000  Normal Value               1
8  2019-01-09  712327.147257   Wrong Value               0
9  2019-01-10      -0.999999   Wrong Value               0

答案 2 :(得分:0)

Python代码:

import pandas as pd

def condition(x):
  if x == 'Wrong Value':
    return 0
  return 1

dict = {'Date': ['2019-01-01','2019-01-02','2019-01-03','2019-01-04','2019-01-05','2019-01-06','2019-01-07','2019-01-08','2019-01-09', '2019-01-10'],
        'Value': [-0.011295, -0.013431, 580944.426061, 0.000000, 0.000000, -0.999998, 0.000000, 0.000000, 712327.147257, -0.999999],
        'String': ['Normal Value', 'Normal Value', 'Wrong Value', 'Normal Value', 'Normal Value', 'Wrong Value', 'Normal Value', 'Normal Value', 'Wrong Value', 'Wrong Value']
       }
df = pd.DataFrame(dict)

new_df = df['String'].apply(condition)
idx = df.index[new_df < 1]

for i in range(0,len(idx),2):
  if idx[i+1] - idx[i] > 1:
    new_df.loc[idx[i]:idx[i+1]] = 0

df['Expected Value'] = new_df
print (df)

输出:

 Date          Value        String  Expected Value
0  2019-01-01      -0.011295  Normal Value               1
1  2019-01-02      -0.013431  Normal Value               1
2  2019-01-03  580944.426061   Wrong Value               0
3  2019-01-04       0.000000  Normal Value               0
4  2019-01-05       0.000000  Normal Value               0
5  2019-01-06      -0.999998   Wrong Value               0
6  2019-01-07       0.000000  Normal Value               1
7  2019-01-08       0.000000  Normal Value               1
8  2019-01-09  712327.147257   Wrong Value               0
9  2019-01-10      -0.999999   Wrong Value               0