注意:我先前在here处问过一个关于相同数据的类似问题,但是现在我试图以不同的方式合并数据框。
我有两个数据框,用于存储不同类型的患者医疗信息。这两个数据帧的共同元素是遭遇ID(hadm_id
),信息记录的时间((n|c)e_charttime
)。
一个数据帧(ds
)包含结构化信息,另一个数据帧(dn
)包含一列,该列带有在指定时间记录的临床记录以进行相遇。这两个数据帧都包含多个遭遇,但是共同的元素是遭遇ID(hadm_id
)。
以下是数据帧的示例:
ds
hadm_id ce_charttime hr sbp dbp
0 140694 2121-08-12 19:00:00 67.0 102.0 75.0
1 140694 2121-08-12 19:45:00 68.0 135.0 68.0
2 140694 2121-08-12 20:00:00 70.0 153.0 94.0
3 171544 2153-09-06 14:11:00 80.0 114.0 50.0
4 171544 2153-09-06 17:30:00 80.0 114.0 50.0
5 171544 2153-09-06 17:35:00 80.0 114.0 50.0
6 171544 2153-09-06 17:40:00 76.0 115.0 51.0
7 171544 2153-09-06 17:45:00 79.0 117.0 53.0
dn
hadm_id ne_charttime note
0 140694 2121-08-10 20:32:00 some text1
1 140694 2121-08-11 12:57:00 some text2
2 140694 2121-08-11 15:18:00 some text3
3 171544 2153-09-05 15:09:00 some text4
4 171544 2153-09-05 17:43:00 some text5
5 171544 2153-09-06 10:36:00 some text6
6 171544 2153-09-06 15:55:00 some text7
7 171544 2153-09-06 17:12:00 some text8
实际数据包括近10,000次遭遇,超过25万行结构化数据和50,000行临床记录。
我想根据信息绘制的时间对其进行合并。例如,如果您从两个数据框中进行一次接触,然后根据图表时间对它们进行排序,那么我希望得到结果数据框中的所有信息,其中NaN
表示缺失值。例如,如果输入上述两个数据框,则我得到的数据框将如下所示:
final
hadm_id charttime ce_charttime hr sbp dbp ne_charttime note
0 140694 2121-08-10 20:32:00 NaT NaN NaN NaN 2121-08-10 20:32:00 some text1
1 140694 2121-08-11 12:57:00 NaT NaN NaN NaN 2121-08-11 12:57:00 some text2
2 140694 2121-08-11 15:18:00 NaT NaN NaN NaN 2121-08-11 15:18:00 some text3
3 140694 2121-08-12 19:00:00 2121-08-12 19:00:00 67.0 102.0 75.0 NaT NaN
4 140694 2121-08-12 19:45:00 2121-08-12 19:45:00 68.0 135.0 68.0 NaT NaN
5 140694 2121-08-12 20:00:00 2121-08-12 20:00:00 70.0 153.0 94.0 NaT NaN
6 171544 2153-09-05 15:09:00 NaT NaN NaN NaN 2153-09-05 15:09:00 some text4
7 171544 2153-09-05 17:43:00 NaT NaN NaN NaN 2153-09-05 17:43:00 some text5
8 171544 2153-09-06 10:36:00 NaT NaN NaN NaN 2153-09-06 10:36:00 some text6
9 171544 2153-09-06 14:11:00 2153-09-06 14:11:00 80.0 114.0 50.0 NaT NaN
10 171544 2153-09-06 15:55:00 NaT NaN NaN NaN 2153-09-06 15:55:00 some text7
11 171544 2153-09-06 17:12:00 NaT NaN NaN NaN 2153-09-06 17:12:00 some text8
12 171544 2153-09-06 17:30:00 2153-09-06 17:30:00 80.0 114.0 50.0 NaT NaN
13 171544 2153-09-06 17:35:00 2153-09-06 17:35:00 80.0 114.0 50.0 NaT NaN
14 171544 2153-09-06 17:40:00 2153-09-06 17:40:00 76.0 115.0 51.0 NaT NaN
15 171544 2153-09-06 17:45:00 2153-09-06 17:45:00 76.0 117.0 53.0 NaT NaN
我实际上是手动键入此结果数据框,我想用大熊猫来生成它。最终,我将删除ce_charttime
和ne_charttime
并仅保留新创建的charttime
列,并在以后适当地填写缺失值。感谢您的任何帮助,如果需要其他信息,请告诉我。
谢谢。
答案 0 :(得分:0)
最终,我将删除
ce_charttime
和ne_charttime
并仅保留新创建的charttime
您可以在连接两个数据框之前 进行操作,然后可以使用熊猫concat
函数将它们附加到单个数据框中。
import pandas as pd
from datetime import datetime
def parse_datetime(strftime):
datetime.strptime(strftime, '%Y-%m-%d %H:%M:%S')
# here I'm assuming both dataframes share a column `charttime` on the same axis
data1 = pd.read_csv('data1.csv', parse_dates=True, date_parser=parse_datetime)
data2 = pd.read_csv('data2.csv', parse_dates=True, date_parser=parse_datetime)
print(data1.head(10), end='\n\n')
print(data2.head(10), end='\n\n')
data = pd.concat([data1, data2], axis=0, sort=True)
data.sort_values(by=['charttime'], inplace=True)
data.reset_index(drop=True, inplace=True)
print(data.head(20))
这是上面代码的输出:
hadm_id charttime hr sbp dbp
0 140694 2121-08-12 19:00:00 67.0 102.0 75.0
1 140694 2121-08-12 19:45:00 68.0 135.0 68.0
2 140694 2121-08-12 20:00:00 70.0 153.0 94.0
3 171544 2153-09-06 14:11:00 80.0 114.0 50.0
4 171544 2153-09-06 17:30:00 80.0 114.0 50.0
5 171544 2153-09-06 17:35:00 80.0 114.0 50.0
6 171544 2153-09-06 17:40:00 76.0 115.0 51.0
7 171544 2153-09-06 17:45:00 79.0 117.0 53.0
hadm_id charttime note
0 140694 2121-08-10 20:32:00 some text1
1 140694 2121-08-11 12:57:00 some text2
2 140694 2121-08-11 15:18:00 some text3
3 171544 2153-09-05 15:09:00 some text4
4 171544 2153-09-05 17:43:00 some text5
5 171544 2153-09-06 10:36:00 some text6
6 171544 2153-09-06 15:55:00 some text7
7 171544 2153-09-06 17:12:00 some text8
charttime dbp hadm_id hr note sbp
0 2121-08-10 20:32:00 NaN 140694 NaN some text1 NaN
1 2121-08-11 12:57:00 NaN 140694 NaN some text2 NaN
2 2121-08-11 15:18:00 NaN 140694 NaN some text3 NaN
3 2121-08-12 19:00:00 75.0 140694 67.0 NaN 102.0
4 2121-08-12 19:45:00 68.0 140694 68.0 NaN 135.0
5 2121-08-12 20:00:00 94.0 140694 70.0 NaN 153.0
6 2153-09-05 15:09:00 NaN 171544 NaN some text4 NaN
7 2153-09-05 17:43:00 NaN 171544 NaN some text5 NaN
8 2153-09-06 10:36:00 NaN 171544 NaN some text6 NaN
9 2153-09-06 14:11:00 50.0 171544 80.0 NaN 114.0
10 2153-09-06 15:55:00 NaN 171544 NaN some text7 NaN
11 2153-09-06 17:12:00 NaN 171544 NaN some text8 NaN
12 2153-09-06 17:30:00 50.0 171544 80.0 NaN 114.0
13 2153-09-06 17:35:00 50.0 171544 80.0 NaN 114.0
14 2153-09-06 17:40:00 51.0 171544 76.0 NaN 115.0
15 2153-09-06 17:45:00 53.0 171544 79.0 NaN 117.0