标题可能有点偏离。我会正确解释。我将收到一个名为df
的列之一,其格式意外,其格式为Data marker
。
有时候,我会marker
,混合使用单个marker
或范围marker
:
marker place1 place2
45 PQR STU
145.0-100 ABC DEF
267.0-175.8 GHI KLM
在转换过程中,我需要拆分包含marker
的{{1}}并变成这样:
-
我还可以获得以下DataFrame,其中所有 marker firstkm lastkm place1 place2
45 45 NaN PQR STU
145.0-100 145.0 100 ABC DEF
267.0-175.8 267.0 175.8 GHI KLM
都不是范围。
marker
使用这段代码:
marker place1 place2
145.0 ABC DEF
267.0 GHI KLM
如果DataFrame就像上面显示的第一个示例一样,我可以实现所需的结果。但是,如果DataFrame像我显示的第二个示例一样,我将无法获得结果。我收到错误消息:
ValueError:长度不匹配:预期轴有1个元素,新值 有2个元素
我知道错误是由于DataFrame无法在 #Split marker to temporary dataframe , split_m
split_m = df.marker.str.split('-', expand=True)
split_m.columns=['firstkm', 'lastkm'] #hitting error here
split_m = split_km[['firstkm', 'lastkm']].replace([None], np.nan)
中放入任何值引起的。但是我不知道该如何处理。
如果我为第二个DataFrame打印lastkm
,我会得到这个:
split_m
如何立即将marker firstkm
145.0 145.0
267.0 267.0
分配给np.nan
以产生以下结果:
lastkm
编辑
我遇到的另一种模式:
marker firstkm lastkm
145.0 145.0 NaN
267.0 267.0 NaN
marker firstkm lastkm place1 place2
45 45 NaN PQR STU
145.0-100 145.0 100 ABC DEF
267.0-175.8 267.0 175.8 GHI KLM
18.1J 18.1J Nan GHI KLM
P7.991-54.3 P7.991 54.3 GHI KLM
UPM Ex 0.5 UPM Ex 0.5 NaN PPP SSS
仍然可以接受。不区分大小写。
答案 0 :(得分:1)
您可以尝试以下方法:
# create a copy of the original df, split_m
split_m = df.copy()
# create the additional required columns with default 'NaN' values
split_m.insert(1, 'firstkm', np.nan)
split_m.insert(2, 'lastkm', np.nan)
# unpack the splitted values to the columns. If nothing to unpack
# for 'lastkm', it will become None
split_m[['firstkm', 'lastkm']] = df.marker.str.split('-', expand=True)
# fill None values with np.nan
split_m.fillna(np.nan, inplace=True)
print(split_m)
输出:
marker firstkm lastkm place1 place2
0 45 45 NaN PQR STU
1 145.0-100 145.0 100 ABC DEF
2 267.0-175.8 267.0 175.8 GHI KLM
3 145.0 145.0 NaN ABC DEF
4 267.0 267.0 NaN GHI KLM
输入已更改的新方案:
# tab separated data for read_clipboard()
# please make sure that you source data
# has a separator other than space.
'''
marker place1 place2
45 PQR STU
145.0-100 ABC DEF
267.0-175.8 GHI KLM
145.0 ABC DEF
267.0 GHI KLM
P7.991-54.3 GHI KLM
UPM Ex 0.5 PPP SSS
'''
import pandas as pd
import numpy as np
df = pd.read_clipboard()
# #Split marker to temporary dataframe , split_m
split_m = df.copy()
# create the additional required columns with default 'NaN' values
split_m.insert(1, 'firstkm', np.nan)
split_m.insert(2, 'lastkm', np.nan)
# unpack the splitted values to the columns. If nothing to unpack
# for 'lastkm', it will become None
split_m[['firstkm', 'lastkm']] = df.marker.str.split('-', expand=True)
split_m.fillna(np.nan, inplace=True)
print(split_m)
输出:
marker firstkm lastkm place1 place2
0 45 45 NaN PQR STU
1 145.0-100 145.0 100 ABC DEF
2 267.0-175.8 267.0 175.8 GHI KLM
3 145.0 145.0 NaN ABC DEF
4 267.0 267.0 NaN GHI KLM
5 P7.991-54.3 P7.991 54.3 GHI KLM
6 UPM Ex 0.5 UPM Ex 0.5 NaN PPP SSS
答案 1 :(得分:0)
使用str.extract
:
print (df["marker"].str.extract("(?P<Start>\d+\.?\d+?)-?(?P<End>\d+\.?\d+?)?"))
Start End
0 45 NaN
1 145.0 100
2 267.0 175.8