我有以下数据,并且我想在特定条件下创建一个新列。请参阅以下内容:
数据集:
real,rel
1,0
0,1
1,1
0,1
0,0
0,0
1,1
1,1
0,0
0,1
1,0
1,1
0,1
1,0
我尝试的代码和收到的错误:
>>> import pandas as pd
>>> df = pd.read_csv("test.csv")
>>> df.loc[df["real"]==0 and df["rel"]==0,"out"] = 9
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python35\lib\site-packages\pandas\core\generic.py", line 1576, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
对于out
列,我的条件是:
当real
是0
并且rel
是0
时,out
应该是0
当real
是1
并且rel
是1
时,out
应该是1
当real
是1
并且rel
是0
时,out
应该是2
当real
是0
并且rel
是1
时,out
应该是3
请让我知道该怎么做才能完成缺失的部分。
我已经检查过:Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
答案 0 :(得分:5)
正在使用np.select
。您可以先定义条件集:
c1 = (df.real == 0) & (df.rel == 0)
c2 = (df.real == 1) & (df.rel == 1)
c3 = (df.real == 1) & (df.rel == 0)
c4 = (df.real == 0) & (df.rel == 1)
然后您可以根据条件的结果在range(4)
中进行选择:
import numpy as np
df['out'] = np.select([c1,c2,c3,c4], range(4))
real rel out
0 1 0 2
1 0 1 3
2 1 1 1
3 0 1 3
4 0 0 0
5 0 0 0
6 1 1 1
7 1 1 1
8 0 0 0
9 0 1 3
10 1 0 2
11 1 1 1
12 0 1 3
13 1 0 2
答案 1 :(得分:4)
when real is 0 and rel is 0, out should be 0
when real is 1 and rel is 1, out should be 1
when real is 1 and rel is 0, out should be 2
when real is 0 and rel is 1, out should be 3
这些情况可以合并为一个陈述:
df['out'] = df['rel'] + 2*(df['real'] != df['rel'])
print(df)
输出:
real rel out
0 1 0 2
1 0 1 3
2 1 1 1
3 0 1 3
4 0 0 0
5 0 0 0
6 1 1 1
7 1 1 1
8 0 0 0
9 0 1 3
10 1 0 2
11 1 1 1
12 0 1 3
13 1 0 2
答案 2 :(得分:2)
您好,以下是您查询的答案:
df.loc[(df["real"]==0) & (df["rel"]==0),"out"] = 0
df.loc[(df["real"]==1) & (df["rel"]==1),"out"] = 1
df.loc[(df["real"]==1) & (df["rel"]==0),"out"] = 2
df.loc[(df["real"]==0) & (df["rel"]==1),"out"] = 3
答案 3 :(得分:1)
一种可能的解决方案是创建助手DataFrame并合并:
df1 = pd.DataFrame({'real': [0, 0, 1, 1], 'rel': [0, 1, 0, 1], 'new': [0, 1, 2, 3]})
print (df1)
real rel new
0 0 0 0
1 0 1 1
2 1 0 2
3 1 1 3
df = df.merge(df1, how='left')
print (df)
real rel new
0 1 0 2
1 0 1 1
2 1 1 3
3 0 1 1
4 0 0 0
5 0 0 0
6 1 1 3
7 1 1 3
8 0 0 0
9 0 1 1
10 1 0 2
11 1 1 3
12 0 1 1
13 1 0 2
答案 4 :(得分:1)
您可以使用numpy.where
有条件地填充列:
df["new_column"] = np.nan
df["new_column"] = np.where((df["real"]==0) & (df["rel"]==0), 0, df["new_column"])
df["new_column"] = np.where((df["real"]==1) & (df["rel"]==1), 1, df["new_column"])
# ... etc. through the rest of your conditions.