我具有以下数据框,并且我想在满足条件之后包含所有基于“个人ID”的信息。
import pandas as pd
data = [['A-1', 'Birth','0'],
['A-1','Sickle cell',"5"],['A-1', 'Lung cancer',"25"],
['A-1','Death','35'],['A-2', 'Birth', '0'],
['A-2','Sarcoma','10'],['A-2', 'Melanoma','19'],
['A-2', 'Current Age', '20'], ['A-3', 'Birth',"0"],
['A-3','Sickle cell','25'],['A-3', "Skin cancer", "29"],
['A-3', "Current Age", '40']]
df = pd.DataFrame(data,columns=["Individual ID", "Diagnosis","Age"])
print df
我尝试了以下代码:
first = pd.DataFrame(df.groupby("Individual ID").filter(lambda g: g["Individual ID"].size > 3))
breast1 = ((first["Repeat Instance"] == 1) & (first["Diagnosis"] != "Sickle cell"))
after = first[breast1]
print after
运行代码后,我得到了:
Individual ID Diagnosis Age Repeat Instance
1 A-1 Sickle cell 5 1
9 A-3 Sickle cell 25 1
我想获取有关A-1和A-3个人的其余信息(出生,当前年龄,其他诊断),但无法弄清楚。
任何帮助将不胜感激。
答案 0 :(得分:0)
以下方法如何?
您可以创建一个附加列,其计数如下:
df['size'] = df.groupby("Individual ID")["Individual ID"].transform('size')
此后,您可以创建一个变量,该变量存储需要子集数据框的条件:
cond = (df['size'] > 3) & (df['Diagnosis']!="Sickle cell")
subset = df[cond].copy()
答案 1 :(得分:0)
我以pythonic方式回答
df = pd.DataFrame(data,columns=["Individual ID", "Diagnosis","Age"])
search = '0'
a = list(filter(lambda x:x[2]==search,data))
print (a)
它返回第三个元素为0的列表,您可以对其进行自定义