根据pandas中的关键字将字符串拆分为两个不同的列?

时间:2017-09-06 09:45:02

标签: python regex pandas dataframe

我有一个数据集包含字符串'useful_crit',以字符串的形式作为数据类型“object”。

Pat_ID   Useful_crit
  1      **inclusive range**:age 35 to 75 - type 2 diabetes **exclusive range**: type 1 diabetes
  2      **inclusive range**:patients aged 21 and above **exclusive range**:patients who are mentally `

每列中的字符串包含两个常用词:包含范围和独占范围。现在,我想从同一个字符串创建两列“包含范围”和“独占范围”。所以输出就像是,

Pat_ID   inclusive range                         exclusive range
 1       age 35 to 75 - type 2 diabetes     type 1 diabetes    
 2       patients aged 21 and above         patients who are mentally

如何在python中执行此操作?

1 个答案:

答案 0 :(得分:0)

这是单程

In [2519]: (df.Useful_crit.str.split('(\**inclusive\**:|\**exclusive\**:)')
              .apply(pd.Series)[[2,4]])
Out[2519]:
                                 2                          4
0  age 35 to 75 - type 2 diabetes             type 1 diabetes
1      patients aged 21 and above   patients who are mentally

In [2520]: df.join(df.Useful_crit.str.split('(\**inclusive\**:|\**exclusive\**:)')
                     .apply(pd.Series)[[2,4]]
                     .rename(columns={2: 'inclusive', 4: 'exclusive'}))
Out[2520]:
   Pat_ID                                        Useful_crit  \
0       1  **inclusive**:age 35 to 75 - type 2 diabetes *...
1       2  **inclusive**:patients aged 21 and above **exc...

                         inclusive                  exclusive
0  age 35 to 75 - type 2 diabetes             type 1 diabetes
1      patients aged 21 and above   patients who are mentally