如何简化我的代码(+拆分列问题)?

时间:2018-08-09 08:49:29

标签: python pandas csv parsing

我希望一些经验丰富的python'er能够帮助我为当前代码提供更简化的方法。

我要/想要的内容
我有一张看起来像这样的桌子:

study id; rack position; box number; freeze-thaw cycles; new rack position; new box number; project code 
24; A1; 10001; 1; A2; 11040; 1,2 
25; B1; 10002; 0; A4; 11045; 1 
26; C2; 10003; 0; A5; 13420; 2

我想解析为以下格式(id =研究ID,count =冻融循环-多个项目代码分开并放在单独的行中)

id; field; count; value
24; rack position; 1; A1
24; box number; 1; 10001
24; new rack position; 1; A2
24; new box number; 1; 11040
24; project code; 1; 1
24; project code; 1; 2
25; rack position; 0; B1
25; box number; 0; 10002
25; new rack position; 0; A2
25; new box number; 0; 11040
25; project code; 0; 1
26; and so on...

如何获得我的奖励:

# import pandas
import pandas as pd

# reading in the data
df = pd.read_table('data.csv', delimiter=';')

# rename "study id" and "freeze-thaw cycles"
df = df.rename(columns={'study id': 'id', 'freeze-thaw cycles': 'count'})

# splitting "project code"
df = df.join(df['project code'].str.split(',', 1, expand=True).rename(columns={0:'Project code1', 1:'Project code2'}))

# remove "project code" 
df = df.drop('project code', 1)

# Split the dataframe based on Project code1 and Project code2
df1 = df[['box number', 'rack position', 'id', 'count', 'new box number',
          'new Rack Position', 'Project code1']]
df2 = df[['box number', 'rack position', 'id', 'count', 'new box number',
          'new rack position', 'Project code2']]

# rename Project code1 and Project code2 to Project code
df1 = df1.rename(columns={'Project code1': 'Project code'})
df2 = df2.rename(columns={'Project code2': 'Project code'})

# concatenate the dataframes based on "Project code"
df = pd.concat([df1, df2], axis=0) #axis=0 for columns, axis=1 for rows

# convert the data frame into the long format
df = pd.melt(df, id_vars=['id', 'count'], var_name='field', value_name='value')

问题:
分割“项目代码”会导致许多行的值为0,如下所示:

id; field; count; value
32; project code; 1; 0
33; project code; 1; 0
34; project code; 1; 0
35; project code; 1; 0

如何解决此问题(最好集成到代码中)?有没有更简化的方法(我的代码看起来很零散)?

谢谢!

干杯,比吉特

0 个答案:

没有答案