使用pandas将数字ID列拆分为两个

时间:2017-11-18 20:00:07

标签: python pandas data-cleaning

              DateTime  Junction  Vehicles           ID
0  2015-11-01 00:00:00         1        15  20151101001
1  2015-11-01 01:00:00         1        13  20151101011
2  2015-11-01 02:00:00         1        10  20151101021
3  2015-11-01 03:00:00         1         7  20151101031
4  2015-11-01 04:00:00         1         9  20151101041
5  2015-11-01 05:00:00         1         6  20151101051
6  2015-11-01 06:00:00         1         9  20151101061
7  2015-11-01 07:00:00         1         8  20151101071
8  2015-11-01 08:00:00         1        11  20151101081
9  2015-11-01 09:00:00         1        12  20151101091

我想将ID列拆分为两个单独的列,前4个数字在一个中,其余数字在第二个中。

我试过的代码:

new_ID = data.apply(lambda x: x.rsplit(4))

但它不起作用。我怎么能用熊猫做到这一点?

3 个答案:

答案 0 :(得分:2)

选项1
最简单,最直接的 - 使用str访问者。

v = df.ID.astype(str)
df['Year'], df['ID'] = v.str[:4], v.str[4:]

df

              DateTime  Junction  Vehicles       ID  Year
0 2015-11-01  00:00:00         1        15  1101001  2015
1 2015-11-01  01:00:00         1        13  1101011  2015
2 2015-11-01  02:00:00         1        10  1101021  2015
3 2015-11-01  03:00:00         1         7  1101031  2015
4 2015-11-01  04:00:00         1         9  1101041  2015
5 2015-11-01  05:00:00         1         6  1101051  2015
6 2015-11-01  06:00:00         1         9  1101061  2015
7 2015-11-01  07:00:00         1         8  1101071  2015
8 2015-11-01  08:00:00         1        11  1101081  2015
9 2015-11-01  09:00:00         1        12  1101091  2015

选项2
str.extract

v = df.ID.astype(str).str.extract('(?P<Year>\d{4})(?P<ID>.*)', expand=True)
df = pd.concat([df.drop('ID', 1), v], 1)

df

              DateTime  Junction  Vehicles  Year       ID
0 2015-11-01  00:00:00         1        15  2015  1101001
1 2015-11-01  01:00:00         1        13  2015  1101011
2 2015-11-01  02:00:00         1        10  2015  1101021
3 2015-11-01  03:00:00         1         7  2015  1101031
4 2015-11-01  04:00:00         1         9  2015  1101041
5 2015-11-01  05:00:00         1         6  2015  1101051
6 2015-11-01  06:00:00         1         9  2015  1101061
7 2015-11-01  07:00:00         1         8  2015  1101071
8 2015-11-01  08:00:00         1        11  2015  1101081
9 2015-11-01  09:00:00         1        12  2015  1101091

答案 1 :(得分:1)

这是一个数值解(假设ID列的长度是常数):

In [10]: df['Year'], df['ID'] = df['ID'] // 10**7, df['ID'] % 10**7

In [11]: df
Out[11]:
              DateTime  Junction  Vehicles       ID  Year
0 2015-11-01  00:00:00         1        15  1101001  2015
1 2015-11-01  01:00:00         1        13  1101011  2015
2 2015-11-01  02:00:00         1        10  1101021  2015
3 2015-11-01  03:00:00         1         7  1101031  2015
4 2015-11-01  04:00:00         1         9  1101041  2015
5 2015-11-01  05:00:00         1         6  1101051  2015
6 2015-11-01  06:00:00         1         9  1101061  2015
7 2015-11-01  07:00:00         1         8  1101071  2015
8 2015-11-01  08:00:00         1        11  1101081  2015
9 2015-11-01  09:00:00         1        12  1101091  2015

答案 2 :(得分:-1)

df[id_col].map(lambda x: int(str(x)[:5])) # as an integer
df[id_col].map(lambda x: str(x)[:5]) # as a string