Question

              DateTime  Junction  Vehicles           ID
0  2015-11-01 00:00:00         1        15  20151101001
1  2015-11-01 01:00:00         1        13  20151101011
2  2015-11-01 02:00:00         1        10  20151101021
3  2015-11-01 03:00:00         1         7  20151101031
4  2015-11-01 04:00:00         1         9  20151101041
5  2015-11-01 05:00:00         1         6  20151101051
6  2015-11-01 06:00:00         1         9  20151101061
7  2015-11-01 07:00:00         1         8  20151101071
8  2015-11-01 08:00:00         1        11  20151101081
9  2015-11-01 09:00:00         1        12  20151101091

我想将ID列拆分为两个单独的列，前4个数字在一个中，其余数字在第二个中。

我试过的代码：

new_ID = data.apply(lambda x: x.rsplit(4))

但它不起作用。我怎么能用熊猫做到这一点？

Answer 1

选项1
最简单，最直接的 - 使用str访问者。

v = df.ID.astype(str)
df['Year'], df['ID'] = v.str[:4], v.str[4:]

df

              DateTime  Junction  Vehicles       ID  Year
0 2015-11-01  00:00:00         1        15  1101001  2015
1 2015-11-01  01:00:00         1        13  1101011  2015
2 2015-11-01  02:00:00         1        10  1101021  2015
3 2015-11-01  03:00:00         1         7  1101031  2015
4 2015-11-01  04:00:00         1         9  1101041  2015
5 2015-11-01  05:00:00         1         6  1101051  2015
6 2015-11-01  06:00:00         1         9  1101061  2015
7 2015-11-01  07:00:00         1         8  1101071  2015
8 2015-11-01  08:00:00         1        11  1101081  2015
9 2015-11-01  09:00:00         1        12  1101091  2015

选项2
str.extract

v = df.ID.astype(str).str.extract('(?P<Year>\d{4})(?P<ID>.*)', expand=True)
df = pd.concat([df.drop('ID', 1), v], 1)

df

              DateTime  Junction  Vehicles  Year       ID
0 2015-11-01  00:00:00         1        15  2015  1101001
1 2015-11-01  01:00:00         1        13  2015  1101011
2 2015-11-01  02:00:00         1        10  2015  1101021
3 2015-11-01  03:00:00         1         7  2015  1101031
4 2015-11-01  04:00:00         1         9  2015  1101041
5 2015-11-01  05:00:00         1         6  2015  1101051
6 2015-11-01  06:00:00         1         9  2015  1101061
7 2015-11-01  07:00:00         1         8  2015  1101071
8 2015-11-01  08:00:00         1        11  2015  1101081
9 2015-11-01  09:00:00         1        12  2015  1101091

Answer 2

这是一个数值解（假设ID列的长度是常数）：

In [10]: df['Year'], df['ID'] = df['ID'] // 10**7, df['ID'] % 10**7

In [11]: df
Out[11]:
              DateTime  Junction  Vehicles       ID  Year
0 2015-11-01  00:00:00         1        15  1101001  2015
1 2015-11-01  01:00:00         1        13  1101011  2015
2 2015-11-01  02:00:00         1        10  1101021  2015
3 2015-11-01  03:00:00         1         7  1101031  2015
4 2015-11-01  04:00:00         1         9  1101041  2015
5 2015-11-01  05:00:00         1         6  1101051  2015
6 2015-11-01  06:00:00         1         9  1101061  2015
7 2015-11-01  07:00:00         1         8  1101071  2015
8 2015-11-01  08:00:00         1        11  1101081  2015
9 2015-11-01  09:00:00         1        12  1101091  2015

Answer 3

df[id_col].map(lambda x: int(str(x)[:5])) # as an integer
df[id_col].map(lambda x: str(x)[:5]) # as a string

使用pandas将数字ID列拆分为两个

3 个答案: