DateTime Junction Vehicles ID
0 2015-11-01 00:00:00 1 15 20151101001
1 2015-11-01 01:00:00 1 13 20151101011
2 2015-11-01 02:00:00 1 10 20151101021
3 2015-11-01 03:00:00 1 7 20151101031
4 2015-11-01 04:00:00 1 9 20151101041
5 2015-11-01 05:00:00 1 6 20151101051
6 2015-11-01 06:00:00 1 9 20151101061
7 2015-11-01 07:00:00 1 8 20151101071
8 2015-11-01 08:00:00 1 11 20151101081
9 2015-11-01 09:00:00 1 12 20151101091
我想将ID
列拆分为两个单独的列,前4个数字在一个中,其余数字在第二个中。
我试过的代码:
new_ID = data.apply(lambda x: x.rsplit(4))
但它不起作用。我怎么能用熊猫做到这一点?
答案 0 :(得分:2)
选项1
最简单,最直接的 - 使用str
访问者。
v = df.ID.astype(str)
df['Year'], df['ID'] = v.str[:4], v.str[4:]
df
DateTime Junction Vehicles ID Year
0 2015-11-01 00:00:00 1 15 1101001 2015
1 2015-11-01 01:00:00 1 13 1101011 2015
2 2015-11-01 02:00:00 1 10 1101021 2015
3 2015-11-01 03:00:00 1 7 1101031 2015
4 2015-11-01 04:00:00 1 9 1101041 2015
5 2015-11-01 05:00:00 1 6 1101051 2015
6 2015-11-01 06:00:00 1 9 1101061 2015
7 2015-11-01 07:00:00 1 8 1101071 2015
8 2015-11-01 08:00:00 1 11 1101081 2015
9 2015-11-01 09:00:00 1 12 1101091 2015
选项2
str.extract
v = df.ID.astype(str).str.extract('(?P<Year>\d{4})(?P<ID>.*)', expand=True)
df = pd.concat([df.drop('ID', 1), v], 1)
df
DateTime Junction Vehicles Year ID
0 2015-11-01 00:00:00 1 15 2015 1101001
1 2015-11-01 01:00:00 1 13 2015 1101011
2 2015-11-01 02:00:00 1 10 2015 1101021
3 2015-11-01 03:00:00 1 7 2015 1101031
4 2015-11-01 04:00:00 1 9 2015 1101041
5 2015-11-01 05:00:00 1 6 2015 1101051
6 2015-11-01 06:00:00 1 9 2015 1101061
7 2015-11-01 07:00:00 1 8 2015 1101071
8 2015-11-01 08:00:00 1 11 2015 1101081
9 2015-11-01 09:00:00 1 12 2015 1101091
答案 1 :(得分:1)
这是一个数值解(假设ID
列的长度是常数):
In [10]: df['Year'], df['ID'] = df['ID'] // 10**7, df['ID'] % 10**7
In [11]: df
Out[11]:
DateTime Junction Vehicles ID Year
0 2015-11-01 00:00:00 1 15 1101001 2015
1 2015-11-01 01:00:00 1 13 1101011 2015
2 2015-11-01 02:00:00 1 10 1101021 2015
3 2015-11-01 03:00:00 1 7 1101031 2015
4 2015-11-01 04:00:00 1 9 1101041 2015
5 2015-11-01 05:00:00 1 6 1101051 2015
6 2015-11-01 06:00:00 1 9 1101061 2015
7 2015-11-01 07:00:00 1 8 1101071 2015
8 2015-11-01 08:00:00 1 11 1101081 2015
9 2015-11-01 09:00:00 1 12 1101091 2015
答案 2 :(得分:-1)
df[id_col].map(lambda x: int(str(x)[:5])) # as an integer
df[id_col].map(lambda x: str(x)[:5]) # as a string