ID Date T Country
1 2/5/12 120 US
1 2/4/13 110 US
1 3/4/12 120 France
2 3/4/12 110 US
2 3/5/12 140 US
3 3/4/12 133 US
我正在尝试编写一个代码,每个唯一ID都会看到T列是否低于阈值(即低于110)或者是否更改了国家/地区。如果是这样,我希望有另一个名为Treatment的列,其中有1个对应于该ID。我怎么做到这一点?
基本上:
给定ID 如果T < 110 - &gt; 1 如果国家/地区发生变化 - > 1 else-&GT; 0
预期产出:
ID日期T国家待遇
1 2/5/12 120 US 1
1 2/4/13 110 US 1
1 3/4/12 120法国1
2 3/4/12 110 US 0
2 3/5/12 140 US 0
3 3/4/12 133 US 0
答案 0 :(得分:1)
使用groupby
和apply
获取布尔系列,指示是否已满足每个ID的条件,并astype
转换为0/1。完成此操作后,请在ID列上使用map
。
def check_condition(grp):
return (grp['T'] < 110).any() | (grp['Country'].nunique() > 1)
cond_map = df.groupby('ID').apply(check_condition).astype(int)
df['Treatment'] = df['ID'].map(cond_map)
或者,如果您不想创建中间人cond_map
,可以将groupby
放入map
:
df['Treatment'] = df['ID'].map(df.groupby('ID').apply(check_condition).astype(int))
结果输出:
ID Date T Country Treatment
0 1 2/5/12 120 US 1
1 1 2/4/13 110 US 1
2 1 3/4/12 120 France 1
3 2 3/4/12 110 US 0
4 2 3/5/12 140 US 0
5 3 3/4/12 133 US 0
答案 1 :(得分:0)
使用熊猫的力量:
import pandas as pd
# Future note: if you could include your sample data like this that would save
# those who are trying to help you a LOT of time :)
df = pd.DataFrame({"ID":[1,1,1,2,2,3],
"Date":["2/5/12","2/4/13","3/4/12","3/4/12","3/5/12","3/4/12"],
"T":[120,110,120,110,140,133],
"Country":["US","US","France","US","US","US"]})
# Using a dictionary to map into the original DataFrame
d = {}
# For each ID
for i in range(len(df["ID"].values)):
unique_id = df["ID"][i]
# Breaking the original data into rows to check each
# instance of 'T'
sub_frame = df.loc[i, :]
# Checks both cases ('T'<110 and unique('Country')>1) at once
if sub_frame["T"] < 110 or len(df.loc[df["ID"]==unique_id, "Country"].unique()) > 1:
d[unique_id] = 1
else:
d[unique_id] = 0
df["Treatment"] = df["ID"].map(d)
print(df)
Country Date ID T Treatment
0 US 2/5/12 1 120 1
1 US 2/4/13 1 110 1
2 France 3/4/12 1 120 1
3 US 3/4/12 2 110 0
4 US 3/5/12 2 140 0
5 US 3/4/12 3 133 0
注意:您的问题要求考虑每个唯一 ID,但由于您希望为每个实例找到T<110
,因此您无法为每个唯一ID执行此操作(因为那里是单个ID的多个实例 - 您尝试比较数组110
中的值[120,110,120]
。