我有一个非常稀疏的数据集-以下是格式的示例。我想根据以下解释的逻辑对特定列进行更改
# create dummy data set
pb=c('1','0','0','0','0','1','Not_ans','1','0','Not_ans')
qa=c('1','1','0','0','1','0','Not_ans','1','Not_ans','Not_ans')
#zy=c('1','Not_ans','0','1','Not_ans','0','1','1','1','Not_ans')
#sub questions for pb
pb.abr=c('1','0','0','0','0','1','0','1','0','0')
pb.ras=c('0','0','0','0','1','0','0','1','0','0')
pb.sfg=c('1','0','0','0','0','0','0','1','0','0')
#sub questions for qa
qa.fgs=c('1','0','0','0','0','0','0','1','0','0')
qa.sdf=c('0','1','0','0','0','0','0','0','0','0')
qa.tyu=c('0','0','0','0','1','0','0','1','0','0')
df=data.frame(pb,qa,pb.abr,pb.ras,pb.sfg,qa.fgs,qa.sdf,qa.tyu)
df
pb qa pb.abr pb.ras pb.sfg qa.fgs qa.sdf qa.tyu
1 1 1 1 0 1 1 0 0
2 0 1 0 0 0 0 1 0
3 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0
5 0 1 0 1 0 0 0 1
6 1 0 1 0 0 0 0 0
7 Not_ans Not_ans 0 0 0 0 0 0
8 1 1 1 1 1 1 0 1
9 0 Not_ans 0 0 0 0 0 0
10 Not_ans Not_ans 0 0 0 0 0 0
这两个列pb和qa称为基本列,它们还有用于命名约定为pb的其他子列。和qa。 -因此,我们看到pa的三个子列和qa的3个子列。我想根据基础列(pa或qa)的条件对这些子列进行更改。
条件是如果列pb =='Not_ans'
然后使所有子列(pb.abr,pb.ras和pb.sfg)='Not_applicable'
我该如何编写实现此目的的功能?我在其中指定基本列名称,即pb
和下面的子列示例'pb.'
的命名-会像下面这样但不会给出结果
data.frame(ifelse(df['base_q']=='Not_ans',
df[ , grepl( paste('base_q','.') , names(df) )]=='Not_applicable',df[,grepl(
paste('base_q','.') , names(df)) ])
我该如何编写一个通用函数,将基本列号作为输入,例如1,2,在此处应用该函数,即无论pb是Not_ans还是哪里,它将sub_columns(pb.abr,pb.ras,pb.sfg)更改为不适用,然后移至第2列(qa)并应用相同的逻辑?
答案 0 :(得分:2)
您可以使用
yf=function(df,v){
df[df[v]=='Not_ans',][,names(df)[substr(names(df),1,nchar(v)+1)==paste0(v,'.')]]='Not_applicable'
return(df)
}
yf(df,'pb')
pb qa pb.abr pb.ras pb.sfg qa.fgs qa.sdf qa.tyu
1 1 1 1 0 1 1 0 0
2 0 1 0 0 0 0 1 0
3 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0
5 0 1 0 1 0 0 0 1
6 1 0 1 0 0 0 0 0
7 Not_ans Not_ans Not_applicable Not_applicable Not_applicable 0 0 0
8 1 1 1 1 1 1 0 1
9 0 Not_ans 0 0 0 0 0 0
10 Not_ans Not_ans Not_applicable Not_applicable Not_applicable 0 0 0
数据输入
df=data.frame(pb,qa,pb.abr,pb.ras,pb.sfg,qa.fgs,qa.sdf,qa.tyu,stringsAsFactors = F)
# notice stringsAsFactors
答案 1 :(得分:0)
以下是一种方法。您可以在var
的{{1}}中指定要应用一个或多个函数的列。在这里,我使用mutate_at()
指定列名称。然后,当pb ==“ Not_ans”替换为“ Not_applicable”时,我将列中的数值替换了。
contains()
如果您想对mutate_at(df,
vars(contains("pb.")),
.funs = funs(ifelse(pb == "Not_ans",
"Not_applicable",
.)))
# pb qa pb.abr pb.ras pb.sfg qa.fgs qa.sdf qa.tyu
#1 1 1 2 1 2 1 0 0
#2 0 1 1 1 1 0 1 0
#3 0 0 1 1 1 0 0 0
#4 0 0 1 1 1 0 0 0
#5 0 1 1 2 1 0 0 1
#6 1 0 2 1 1 0 0 0
#7 Not_ans Not_ans Not_applicable Not_applicable Not_applicable 0 0 0
#8 1 1 2 2 2 1 0 1
#9 0 Not_ans 1 1 1 0 0 0
#10 Not_ans Not_ans Not_applicable Not_applicable Not_applicable 0 0 0
和pb
都应用相同的任务,则可以两次使用qa
。
mutate_at()
答案 2 :(得分:0)
基于@ Wen-Ben给出的答案-以下代码有效-
class BMI:
def __init__(self, firstName, lastName, age, height, weight):
self.firstName = firstName
self.lastName = lastName
self.fullName = firstName + " " + lastName
self.age = age
self.height = (height * 0.025) ** 2
self.weight = weight * 0.45
def setFullName(self, firstName, lastName):
self.firstName = firstName
self.lastName = lastName
self.fullName = firstName + " " + lastName
print(self.fullName)
def setAge(self, age):
self.age = age
def setHeight(self, height):
self.height = (height * 0.025) ** 2
def setWeight(self, weight):
self.weight = weight * 0.45
def getBMI(self):
bmi = self.weight // self.height
return bmi
def getStatus(self):
getBMI()
if bmi < 19:
print("You have an unhealthy BMI, gain some weight!")
elif bmi > 19 and bmi < 25:
print("You have a healthy BMI")
else:
print("You have an unhealthy BMI, lose some weight!")
firstName = input("Enter your first name: ")
lastName = input("Enter your last name: ")
age = int(input("Enter your age: "))
height = int(input("Enter your height in inches: "))
weight = int(input("Enter your weight in lbs: "))
userInputBMI = BMI(firstName, lastName, age, height, weight)
print(userInputBMI.setFullName(firstName, lastName))
print("Your BMI is:", userInputBMI.getBMI())
print(userInputBMI.getStatus())