例如,我有一个包含许多行和列的数据框
treatment gene1 gene2 gene3 …
A 0 3 0 …
A 0 0 0 …
A 0 0 0 …
A 1 1 0 …
A 0 0 0 …
B 0 1 1 …
B 0 5 2 …
B 0 0 3 …
B 0 0 0 …
… … … …
我希望基于以下规则获得以下数据框:如果每种处理的每个基因的值为0,则该处理的该基因的值为0(例如,处理A的gene1),否则为1 (例如,治疗B的基因1)。因此,新的数据框将是下面的数据框。
treatment gene1 gene2 gene3 …
A 1 1 0 …
B 0 1 1 …
… … … … …
非常感谢您的帮助。
答案 0 :(得分:1)
使用def insertTime(initialTime):
if "TBA" in initialTime:
return ["TBA", "TBA"]
startTime,endTime = initialTime.split("-")
try:
if "PM" in endTime:
startTimeHours = startTime.split(":")[0]
if ":" in startTime:
startTimeMinutes = ":" + startTime.split(":")[1]
else:
startTimeMinutes = ":00"
if int(startTimeHours) in range(9,12):
startTimeHours += startTimeMinutes + "AM"
if ":" not in startTime:
startTime +=":00"
if "AM" not in startTime:
startTime += endTime[-2:]
return [startTime, endTime]
except Exception as e:
print(f"Error insertTime: Start-> {startTime}, endTime->{endTime}")
print(e)
return [0,0]
,您可以执行以下操作:
dplyr
与df %>%
group_by(treatment) %>%
summarise_all(list(~ as.integer(any(.))))
treatment gene1 gene2 gene3
<fct> <int> <int> <int>
1 A 1 1 0
2 B 0 1 1
相同:
base R
答案 1 :(得分:0)
带有base R
+(rowsum(df[-1], df$treatment) > 0)
# gene1 gene2 gene3
#A 1 1 0
#B 0 1 1
df <- structure(list(treatment = c("A", "A", "A", "A", "A", "B", "B",
"B", "B"), gene1 = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L), gene2 = c(3L,
0L, 0L, 1L, 0L, 1L, 5L, 0L, 0L), gene3 = c(0L, 0L, 0L, 0L, 0L,
1L, 2L, 3L, 0L)), class = "data.frame", row.names = c(NA, -9L
))