我有以下数据,并且希望合并以下各列以创建一个新的二进制数,no = 0和yes =1。我要合并到新列中的功能是:
曾经告诉过您患有充血性心力衰竭,曾经告诉过您患有冠心病, 曾经告诉过您患有心绞痛/心绞痛,曾经告诉过您患有心脏病,曾经告诉过您患有中风
Age in years at screening 15881 non-null float64
Race/Hispanic origin 15881 non-null object
Ratio of family income to poverty 15881 non-null float64
Gender 15881 non-null object
year 15881 non-null object
60 sec. pulse (30 sec. pulse * 2) 15881 non-null float64
Weight (kg) 15881 non-null float64
Standing Height (cm) 15881 non-null float64
Waist Circumference (cm) 15881 non-null float64
Arm Circumference (cm) 15881 non-null float64
Ever told had congestive heart failure 15881 non-null object
Ever told you had coronary heart disease 15881 non-null object
Ever told you had angina/angina pectoris 15881 non-null object
Ever told you had heart attack 15881 non-null object
Ever told you had a stroke 15881 non-null object
Do you now smoke cigarettes? 15881 non-null object
Doctor told you have diabetes 15881 non-null object
How often drink alcohol over past 12 mos 15881 non-null float64
Sodium (mmol/L) 15881 non-null float64
Cholesterol, refrigerated serum (mg/dL) 15881 non-null float64
avg_systolic_blood_pres 15881 non-null float64
avg_diastolic_blood_pres 15881 non-null float64
我还担心最终可能会获得比原始数据集更多的数据(15881行,22列)
答案 0 :(得分:0)
如果您想创建一个新列,如果其中任何一个列的值为“ 1”,则返回“ true”,则可以执行以下操作:
df = pd.DataFrame({'congestive': np.random.randint(2, size=10),
'coronary': np.random.randint(2, size=10)})
df['new'] = (df['congestive'] == 1) | (df['coronary'] == 1)
Out[66]:
congestive coronary new
0 1 1 True
1 1 1 True
2 1 0 True
3 1 1 True
4 0 0 False
5 0 0 False
6 0 1 True
7 1 0 True
8 0 1 True
9 1 1 True
有关将“正确/错误”更改为1/0的信息,请参见Is there a simple way to change a column of yes/no to 1/0 in a Pandas dataframe?。
答案 1 :(得分:0)
假设您的数据是这种格式(表已已转置,零是伪变量)
15881 15882 15883
Q
Age_in_years_at_screening 0 0 0
Race/Hispanic_origin 0 0 0
Ratio_of_family_income_to_poverty 0 0 0
Gender 0 0 0
year 0 0 0
60_sec._pulse_(30_sec._pulse_*_2) 0 0 0
Weight_(kg) 0 0 0
Standing_Height_(cm) 0 0 0
Waist_Circumference_(cm) 0 0 0
Arm_Circumference_(cm) 0 0 0
Ever_told_had_congestive_heart_failure False False False
Ever_told_you_had_coronary_heart_disease True False False
Ever_told_you_had_angina/angina_pectoris True False True
Ever_told_you_had_heart_attack True False True
Ever_told_you_had_a_stroke True False True
Do_you_now_smoke_cigarettes? 0 0 0
Doctor_told_you_have_diabetes 0 0 0
How_often_drink_alcohol_over_past_12_mos 0 0 0
Sodium_(mmol/L) 0 0 0
Cholesterol_refrigerated_serum_(mg/dL) 0 0 0
avg_systolic_blood_pres 0 0 0
avg_diastolic_blood_pres 0 0 0
您可以指定感兴趣的问题并进行处理
questions = ['Ever_told_had_congestive_heart_failure',
'Ever_told_you_had_coronary_heart_disease',
'Ever_told_you_had_angina/angina_pectoris',
'Ever_told_you_had_heart_attack',
'Ever_told_you_had_a_stroke']
df["Ever_told_combined"] = df[questions].apply(lambda row: np.logical_or.reduce(row), axis=1)
将“ Ever_told_combined”列添加到数据框
15881 True
15882 False
15883 True
dtype: bool