我希望一些经验丰富的python'er能够帮助我为当前代码提供更简化的方法。
我要/想要的内容:
我有一张看起来像这样的桌子:
study id; rack position; box number; freeze-thaw cycles; new rack position; new box number; project code
24; A1; 10001; 1; A2; 11040; 1,2
25; B1; 10002; 0; A4; 11045; 1
26; C2; 10003; 0; A5; 13420; 2
我想解析为以下格式(id =研究ID,count =冻融循环-多个项目代码分开并放在单独的行中)
id; field; count; value
24; rack position; 1; A1
24; box number; 1; 10001
24; new rack position; 1; A2
24; new box number; 1; 11040
24; project code; 1; 1
24; project code; 1; 2
25; rack position; 0; B1
25; box number; 0; 10002
25; new rack position; 0; A2
25; new box number; 0; 11040
25; project code; 0; 1
26; and so on...
如何获得我的奖励:
# import pandas
import pandas as pd
# reading in the data
df = pd.read_table('data.csv', delimiter=';')
# rename "study id" and "freeze-thaw cycles"
df = df.rename(columns={'study id': 'id', 'freeze-thaw cycles': 'count'})
# splitting "project code"
df = df.join(df['project code'].str.split(',', 1, expand=True).rename(columns={0:'Project code1', 1:'Project code2'}))
# remove "project code"
df = df.drop('project code', 1)
# Split the dataframe based on Project code1 and Project code2
df1 = df[['box number', 'rack position', 'id', 'count', 'new box number',
'new Rack Position', 'Project code1']]
df2 = df[['box number', 'rack position', 'id', 'count', 'new box number',
'new rack position', 'Project code2']]
# rename Project code1 and Project code2 to Project code
df1 = df1.rename(columns={'Project code1': 'Project code'})
df2 = df2.rename(columns={'Project code2': 'Project code'})
# concatenate the dataframes based on "Project code"
df = pd.concat([df1, df2], axis=0) #axis=0 for columns, axis=1 for rows
# convert the data frame into the long format
df = pd.melt(df, id_vars=['id', 'count'], var_name='field', value_name='value')
问题:
分割“项目代码”会导致许多行的值为0,如下所示:
id; field; count; value
32; project code; 1; 0
33; project code; 1; 0
34; project code; 1; 0
35; project code; 1; 0
如何解决此问题(最好集成到代码中)?有没有更简化的方法(我的代码看起来很零散)?
谢谢!
干杯,比吉特