I have a dataframe which looks like this:
df = pd.DataFrame({"HouseholdNumber": [1, 1, 1, 1, 1, 2, 2], "TypeOfPerson": ["Son", "Daughter", "Daughter", "Parent", "Parent", "Daughter", "Parent"], "Age": [17, 10, 20, 52, 45, 22, 50]})
print(df)
HouseholdNumber TypeOfPerson Age
0 1 Son 17
1 1 Daughter 10
2 1 Daughter 20
3 1 Parent 52
4 1 Parent 45
5 2 Daughter 22
6 2 Parent 50
and I want to create a new variable using information from multiple lines. This is a problem for me because I'm having problems with using a simple df.loc
(or np.where
) condition. Specifically, I want the new variable to have the value no
in case the person is not a parent or has no child in the age groups, an a
if the parent has a child which is 18 years old or younger and a b
if the parent has a child which is between 19 and 25 years old. If the parents have a child of both age groups, the value should still be an a
. The HouseholdNumber indicates the different families, so all the conditions should apply for each Household. So, the dataframe should look like this:
HouseholdNumber TypeOfPerson Age Child
0 1 Son 17 no
1 1 Daughter 10 no
2 1 Daughter 20 no
3 1 Parent 52 a
4 1 Parent 45 a
5 2 Daughter 22 no
6 2 Parent 50 b
The code I'm trying is
df["Child"]=""
for i in df["HouseholdNumber"].unique():
if (df.loc[df.TypeOfPerson.isin(["Son", "Daughter"]) & (df.Age <= 18)]):
if (df.loc[(df.TypeOfPerson == "Parent")]):
df["Child"] = "a"
elif (df.loc[df.TypeOfPerson.isin(["Son", "Daughter"]) & ((df.Age >= 19) & (df.Age <= 26))]):
df["Child"] = "b"
else:
df["Child"] = "no"
which gives me the error The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
. I'm not really sure where to go from here, I always get this error. Even without the error I suspect that my code will not give the desired result though.
答案 0 :(得分:1)
此处的错误是,您使用索引列表访问df.loc
,例如:
df.loc[df.TypeOfPerson.isin(["Son", "Daughter"]) & (df.Age <= 18)]
将返回一个包含几行的数据框。因此,当您将其放在if
后面时,它会询问如何将该数据帧评估为布尔值,它是any
单元格True
还是all
单元格{{ 1}}等。
解决错误的一种方法是指定所述操作,或者在您的情况下,您想知道房子是否有孩子,只需检查切片数据帧的长度即可。
True
当然,这只是解决问题的一种方法,而不是最好的方法。
答案 1 :(得分:1)
我会使用groupby
这样的方式,因为您可以一次与每个家庭打交道。
示例(请注意,并非所有案件都得到处理)
import pandas as pd
# Create the dataframe
df = pd.DataFrame(data={
"TypeOfPerson": ["Son", "Parent", "Daughter", "Son", "Parent", "Daughter", "Daughter", "Parent", "Son"],
"HouseholdNumber": [1, 1, 1, 1, 2, 2, 2, 3, 3],
"Age": [17,50,20,13,40,19,5, 50, 25]
})
# Add new column
df["Child"] = pd.Series()
# Group by household
households = df.groupby("HouseholdNumber")
# Iterate through groups
for household_number in households.groups:
household = households.get_group(household_number)
# Household offspring
offspring = household.query("TypeOfPerson == 'Son' | TypeOfPerson == 'Daughter'")
# Sons and daughters that are 18 or younger
children = offspring.query("Age <= 18")
# Sons and daughters that young adults (19 >= age <= 25)
young_adults = household.query("Age >= 19 & Age <= 25")
# Parents
parents = household.query("TypeOfPerson == 'Parent'")
# Change original data frame
df.loc[offspring.index, "Child"] = "No"
if children.shape[0]:
df.loc[parents.index, "Child"] = "a"
elif young_adults.shape[0]:
df.loc[parents.index, "Child"] = "b"