我有一个数据框,很复杂。数据框有很多按日期时间和项目划分的块。起源excel:
name sex age ID start end main data testtime item subitem result unit mark reference testman comfirmman
LSF female 60 12345678 2018-12-18 08:58 2018-12-29 08:30 knee 11
2018-12-28 13:59 metabolism II comfirm 12345678
subitem result unit mark reference
Na 142 mmol/L 135 - 145
K 3.98 mmol/L 3.50 - 5.30
Cl 105 mmol/L 96 - 110
PHOS 1.25 mmol/L 0.97 - 1.62
testman:YYY comfirmman:AAA
2018-12-28 9:57 routine blood comfirm 12345678
subitem result unit mark reference
CRP 14.72 mg/L ↑ 0.00 - 10.00
WBC 6.73 x10^9/L 4.00 - 10.00
NEUT% 0.524 0.460 - 0.750
testman:BBB comfirmman:EEE
我想将有关列索引的行更改为列。我想要什么:
name sex age ID start end main data testtime item subitem result unit mark reference testman comfirmman
LSF female 60 12345678 2018-12-18 08:58 2018-12-29 08:30 knee 11 2018-12-28 13:59 metabolism II Na 142 mmol/L 135 - 145 YYY AAA
LSF female 60 12345678 2018-12-18 08:58 2018-12-29 08:30 knee 11 2018-12-28 13:59 metabolism II K 3.98 g/L 3.50 - 5.30 YYY AAA
LSF female 60 12345678 2018-12-18 08:58 2018-12-29 08:30 knee 11 2018-12-28 13:59 metabolism II Cl 105 mmol/L 96 - 110 YYY AAA
LSF female 60 12345678 2018-12-18 08:58 2018-12-29 08:30 knee 11 2018-12-28 13:59 metabolism II PHOS 1.25 u/L 0.97 - 1.62 YYY AAA
LSF female 60 12345678 2018-12-18 08:58 2018-12-29 08:30 knee 11 2018-12-28 9:57 routine blood CRP 14.72 mg/L ↑ 0.00 - 10.00 BBB EEE
LSF female 60 12345678 2018-12-18 08:58 2018-12-29 08:30 knee 11 2018-12-28 9:57 routine blood WBC 6.73 x10^9/L 4.00 - 10.00 BBB EEE
LSF female 60 12345678 2018-12-18 08:58 2018-12-29 08:30 knee 11 2018-12-28 9:57 routine blood NEUT% 0.524 0.460 - 0.750 BBB EEE
谢谢前进!
答案 0 :(得分:0)
您可以通过使用Transpose方法来做到这一点
transposed_dataframe = your_dataframe.T
示例:
import numpy as np
import pandas as pd
# Just random value
a = np.random.random(10)
b = np.random.random(10)
c = np.random.random(10)
df = pd.DataFrame({'a':a,'b':b,'c':c})
print('Original Dataframe')
print(df)
transposed_dataframe = df.T
print('Transposed Dataframe')
print(df.T)
输出:
Original Dataframe
a b c
0 0.254146 0.017214 0.024618
1 0.958870 0.297118 0.935739
2 0.492764 0.626654 0.259336
3 0.979305 0.811364 0.321847
4 0.723043 0.570478 0.222365
5 0.717678 0.833348 0.188363
6 0.695006 0.712678 0.313900
7 0.071923 0.529029 0.018965
8 0.868739 0.152821 0.349268
9 0.766499 0.651031 0.109461
Transposed Dataframe
0 1 2 3 4 5 6 7 8 9
a 0.254146 0.958870 0.492764 0.979305 0.723043 0.717678 0.695006 0.071923 0.868739 0.766499
b 0.017214 0.297118 0.626654 0.811364 0.570478 0.833348 0.712678 0.529029 0.152821 0.651031
c 0.024618 0.935739 0.259336 0.321847 0.222365 0.188363 0.313900 0.018965 0.349268 0.109461
答案 1 :(得分:0)
从半结构化Excel中提取数据总是很丑
data = '''name sex age ID start end main data testtime item subitem result unit mark reference testman comfirmman
LSF female 60 12345678 2018-12-18 08:58 2018-12-29 08:30 knee 11
2018-12-28 13:59 metabolism II comfirm 12345678
subitem result unit mark reference
Na 142 mmol/L 135 - 145
K 3.98 mmol/L 3.50 - 5.30
Cl 105 mmol/L 96 - 110
PHOS 1.25 mmol/L 0.97 - 1.62
testman:YYY comfirmman:AAA
2018-12-28 9:57 routine blood comfirm 12345678
subitem result unit mark reference
CRP 14.72 mg/L ↑ 0.00 - 10.00
WBC 6.73 x10^9/L 4.00 - 10.00
NEUT% 0.524 0.460 - 0.750
testman:BBB comfirmman:EEE '''
# first two rows are master data
h = [[t.strip() for t in re.split(" ", l) if t!=""] for l in data.split("\n")[:2] ]
h[0][:len(h[1])] # strip columns down to number of data items found
hf = pd.DataFrame(h[1:], columns=h[0][:len(h[1])])
# insert ID into detail data
d = [[hf.loc[0:,"ID"].values[0]]+[t.strip() for t in re.split(" ", l) if t.strip()!=""] for l in data.split("\n")[3:] ]
d[0][0] = "ID" # modify column header
df = pd.DataFrame(d[1:], columns=d[0])
# find the rows that have testman and confirmman
rows = df[df["subitem"].str.contains("testman")].index.values
# update each row with testman and confirmman
for i, r in enumerate(rows):
rs = 0 if i==0 else rows[i-1]+1
df.loc[rs:r-1, "testman"] = df.loc[r:r,"subitem"].values[0].replace("testman:", "")
df.loc[rs:r-1, "confirmman"] = df.loc[r:r,"result"].values[0].replace("comfirmman:", "")
df.loc[df["unit"].isna(),"testman"] = np.nan # a bit more cleanup
# join it all together excluding detail rows that are not test results
hf.merge(df[~df["testman"].isna()], on="ID")
输出
name sex age ID start end main data subitem result unit mark reference testman confirmman
0 LSF female 60 12345678 2018-12-18 08:58 2018-12-29 08:30 knee 11 Na 142 mmol/L 135 - 145 None None YYY AAA
1 LSF female 60 12345678 2018-12-18 08:58 2018-12-29 08:30 knee 11 K 3.98 mmol/L 3.50 - 5.30 None YYY AAA
2 LSF female 60 12345678 2018-12-18 08:58 2018-12-29 08:30 knee 11 Cl 105 mmol/L 96 - 110 None None YYY AAA
3 LSF female 60 12345678 2018-12-18 08:58 2018-12-29 08:30 knee 11 PHOS 1.25 mmol/L 0.97 - 1.62 None YYY AAA
4 LSF female 60 12345678 2018-12-18 08:58 2018-12-29 08:30 knee 11 subitem result unit mark reference BBB EEE
5 LSF female 60 12345678 2018-12-18 08:58 2018-12-29 08:30 knee 11 CRP 14.72 mg/L ↑ 0.00 - 10.00 BBB EEE
6 LSF female 60 12345678 2018-12-18 08:58 2018-12-29 08:30 knee 11 WBC 6.73 x10^9/L 4.00 - 10.00 None BBB EEE
7 LSF female 60 12345678 2018-12-18 08:58 2018-12-29 08:30 knee 11 NEUT% 0.524 0.460 - 0.750 None None BBB EEE