我有以下三个数据框:
final_df
library(tidyverse)
dfb %>%
mutate(gene_name_list = str_split(gene_name, "; ")) %>%
mutate(gene_of_interest = map_lgl(gene_name_list, some, ~ . %in% dfa$gene_name)) %>%
filter(gene_of_interest == TRUE) %>%
select(gene_name, id)
ref_df
other ref
(2014-12-24 13:20:00-05:00, a) NaN NaN
(2014-12-24 13:40:00-05:00, b) NaN NaN
(2018-07-03 14:00:00-04:00, d) NaN NaN
other_df
a b c d
2014-12-24 13:20:00-05:00 1 2 3 4
2014-12-24 13:40:00-05:00 2 3 4 5
2017-11-24 13:10:00-05:00 ..............
2018-07-03 13:25:00-04:00 ..............
2018-07-03 14:00:00-04:00 9 10 11 12
2019-07-03 13:10:00-04:00 ..............
我需要将final_df中的NaN值替换为相关数据框,如下所示:
a b c d
2014-12-24 13:20:00-05:00 10 20 30 40
2014-12-24 13:40:00-05:00 20 30 40 50
2017-11-24 13:10:00-05:00 ..............
2018-07-03 13:20:00-04:00 ..............
2018-07-03 13:25:00-04:00 ..............
2018-07-03 14:00:00-04:00 90 100 110 120
2019-07-03 13:10:00-04:00 ..............
我如何得到它?
答案 0 :(得分:2)
pandas.DataFrame.lookup
final_df['ref'] = ref_df.lookup(*zip(*final_df.index))
final_df['other'] = other_df.lookup(*zip(*final_df.index))
map
和get
当您缺少位时
final_df['ref'] = list(map(ref_df.stack().get, final_df.index))
final_df['other'] = list(map(other_df.stack().get, final_df.index))
idx = pd.MultiIndex.from_tuples([(1, 'a'), (2, 'b'), (3, 'd')])
final_df = pd.DataFrame(index=idx, columns=['other', 'ref'])
ref_df = pd.DataFrame([
[ 1, 2, 3, 4],
[ 2, 3, 4, 5],
[ 9, 10, 11, 12]
], [1, 2, 3], ['a', 'b', 'c', 'd'])
other_df = pd.DataFrame([
[ 10, 20, 30, 40],
[ 20, 30, 40, 50],
[ 90, 100, 110, 120]
], [1, 2, 3], ['a', 'b', 'c', 'd'])
print(final_df, ref_df, other_df, sep='\n\n')
other ref
1 a NaN NaN
2 b NaN NaN
3 d NaN NaN
a b c d
1 1 2 3 4
2 2 3 4 5
3 9 10 11 12
a b c d
1 10 20 30 40
2 20 30 40 50
3 90 100 110 120
final_df['ref'] = ref_df.lookup(*zip(*final_df.index))
final_df['other'] = other_df.lookup(*zip(*final_df.index))
final_df
other ref
1 a 10 1
2 b 30 3
3 d 120 12
答案 1 :(得分:0)
可以解决ref_df
和other_df
中缺少日期的另一种解决方案:
index = pd.MultiIndex.from_tuples(final_df.index)
ref = ref_df.stack().rename('ref')
other = other_df.stack().rename('other')
result = pd.DataFrame(index=index).join(ref).join(other)