如何根据条件将值添加到新列?

时间:2019-05-20 13:50:16

标签: python dataframe

我试图根据条件在数据集中添加一个新列,但是,结果数据框不是我期望的。

我已经尝试了一些方法,这与我所经历的最接近。

import pandas as pd

data = {'Date' : ['3-Mar', '20-Mar', '20-Apr', '21-Apr', '29-Apr', '7- 
         May', '30-May', '31-May', '7-Jun', '16-Jun',
        '1-Jul', '2-Jul', '10-Jul'],
        'Value' : [0.5840, 0.8159, 0.7789, 0.7665, 0.8510, 0.7428, 0.7124, 
        0.6820, 0.8714, 0.8902, 0.8596, 0.8289, 0.6877],}
frame = pd.DataFrame(data)

for counter, value in enumerate(frame['Value']):
    if value >= 0.7:
        frame = frame.append({'result': 'High'}, ignore_index=True)   
    else:
        frame = frame.append({'result': 'Low'}, ignore_index=True)   

print(frame)

结果是:

     Date   Value result
0    3-Mar  0.5840    NaN
1   20-Mar  0.8159    NaN
2   20-Apr  0.7789    NaN
3   21-Apr  0.7665    NaN
4   29-Apr  0.8510    NaN
5    7-May  0.7428    NaN
6   30-May  0.7124    NaN
7   31-May  0.6820    NaN
8    7-Jun  0.8714    NaN
9   16-Jun  0.8902    NaN
10   1-Jul  0.8596    NaN
11   2-Jul  0.8289    NaN
12  10-Jul  0.6877    NaN
13     NaN     NaN    Low
14     NaN     NaN   High
15     NaN     NaN   High
16     NaN     NaN   High
17     NaN     NaN   High
18     NaN     NaN   High
19     NaN     NaN   High
20     NaN     NaN    Low
21     NaN     NaN   High
22     NaN     NaN   High
23     NaN     NaN   High
24     NaN     NaN   High
25     NaN     NaN    Low

但是,我希望这些值将放置在现有值而不是新值的旁边。

谢谢!

3 个答案:

答案 0 :(得分:1)

如果您查看append函数的文档,您会发现它会将行追加到数据框的末尾,而不是您想要的:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html

您可以使用lambda函数来实现此目的,该函数实际上会应用所需的逻辑遍历每一行。

frame['result'] = frame['Value'].apply(lambda x: 'High' if x > .7 else "Low")

答案 1 :(得分:0)

如果我理解得很好,这可能已经回答了,但是你去了

您需要创建一个新列result

定义一个函数(出于可读性),该函数需要一个值并返回结果

def udf(value):
    if value >= .7:
        return "High"
    else
        return "Low"

然后将此功能应用于列值

frame['result'] = frame['Value'].apply(udf)

我建议您阅读文档DataFrame.apply

答案 2 :(得分:0)

使用pandas.Series可以解决您的问题

import pandas as pd

data = {'Date' : ['3-Mar', '20-Mar', '20-Apr', '21-Apr', '29-Apr', '7- May', 
                  '30-May', '31-May', '7-Jun', '16-Jun','1-Jul', '2-Jul', '10-Jul'],
        'Value' : [0.5840, 0.8159, 0.7789, 0.7665, 0.8510, 0.7428, 0.7124, 
                   0.6820, 0.8714, 0.8902, 0.8596, 0.8289, 0.6877]}
frame = pd.DataFrame(data)
frame['result'] = pd.Series(['High' if x >= 0.7 else 'Low' for x in frame['Value']])

输出:

Date    Value   result
0   3-Mar   0.5840  Low
1   20-Mar  0.8159  High
2   20-Apr  0.7789  High
3   21-Apr  0.7665  High
4   29-Apr  0.8510  High
5   7- May  0.7428  High
6   30-May  0.7124  High
7   31-May  0.6820  Low
8   7-Jun   0.8714  High
9   16-Jun  0.8902  High
10  1-Jul   0.8596  High
11  2-Jul   0.8289  High
12  10-Jul  0.6877  Low