将NaN分配给基于另一列的列

时间:2020-05-29 04:17:29

标签: python pandas dataframe

标题可能有点偏离。我会正确解释。我将收到一个名为df的列之一,其格式意外,其格式为Data marker

有时候,我会marker,混合使用单个marker或范围marker

marker             place1       place2
45                  PQR           STU
145.0-100           ABC           DEF
267.0-175.8         GHI           KLM

在转换过程中,我需要拆分包含marker的{​​{1}}并变成这样:

-

我还可以获得以下DataFrame,其中所有 marker firstkm lastkm place1 place2 45 45 NaN PQR STU 145.0-100 145.0 100 ABC DEF 267.0-175.8 267.0 175.8 GHI KLM 都不是范围。

marker

使用这段代码:

marker    place1       place2
145.0      ABC           DEF
267.0      GHI           KLM

如果DataFrame就像上面显示的第一个示例一样,我可以实现所需的结果。但是,如果DataFrame像我显示的第二个示例一样,我将无法获得结果。我收到错误消息:

ValueError:长度不匹配:预期轴有1个元素,新值 有2个元素

我知道错误是由于DataFrame无法在 #Split marker to temporary dataframe , split_m split_m = df.marker.str.split('-', expand=True) split_m.columns=['firstkm', 'lastkm'] #hitting error here split_m = split_km[['firstkm', 'lastkm']].replace([None], np.nan) 中放入任何值引起的。但是我不知道该如何处理。

如果我为第二个DataFrame打印lastkm,我会得到这个:

split_m

如何立即将marker firstkm 145.0 145.0 267.0 267.0 分配给np.nan以产生以下结果:

lastkm

编辑

我遇到的另一种模式:

marker     firstkm   lastkm  
145.0       145.0     NaN
267.0       267.0     NaN

marker firstkm lastkm place1 place2 45 45 NaN PQR STU 145.0-100 145.0 100 ABC DEF 267.0-175.8 267.0 175.8 GHI KLM 18.1J 18.1J Nan GHI KLM P7.991-54.3 P7.991 54.3 GHI KLM UPM Ex 0.5 UPM Ex 0.5 NaN PPP SSS 仍然可以接受。不区分大小写。

2 个答案:

答案 0 :(得分:1)

您可以尝试以下方法:

# create a copy of the original df, split_m
split_m = df.copy()

# create the additional required columns with default 'NaN' values
split_m.insert(1, 'firstkm', np.nan)
split_m.insert(2, 'lastkm', np.nan)

# unpack the splitted values to the columns. If nothing to unpack
# for 'lastkm', it will become None
split_m[['firstkm', 'lastkm']] = df.marker.str.split('-', expand=True)
# fill None values with np.nan
split_m.fillna(np.nan, inplace=True)

print(split_m)

输出:

        marker firstkm lastkm place1 place2
0           45      45    NaN    PQR    STU
1    145.0-100   145.0    100    ABC    DEF
2  267.0-175.8   267.0  175.8    GHI    KLM
3        145.0   145.0    NaN    ABC    DEF
4        267.0   267.0    NaN    GHI    KLM

输入已更改的新方案:

# tab separated data for read_clipboard()
# please make sure that you source data 
# has a separator other than space.
'''
marker  place1  place2
45  PQR STU
145.0-100   ABC DEF
267.0-175.8 GHI KLM
145.0   ABC DEF
267.0   GHI KLM
P7.991-54.3 GHI KLM
UPM Ex 0.5  PPP SSS
'''

import pandas as pd
import numpy as np

df = pd.read_clipboard()

# #Split marker to temporary dataframe , split_m
split_m = df.copy()

# create the additional required columns with default 'NaN' values
split_m.insert(1, 'firstkm', np.nan)
split_m.insert(2, 'lastkm', np.nan)

# unpack the splitted values to the columns. If nothing to unpack
# for 'lastkm', it will become None
split_m[['firstkm', 'lastkm']] = df.marker.str.split('-', expand=True)
split_m.fillna(np.nan, inplace=True)

print(split_m)

输出:

        marker     firstkm lastkm place1 place2
0           45          45    NaN    PQR    STU
1    145.0-100       145.0    100    ABC    DEF
2  267.0-175.8       267.0  175.8    GHI    KLM
3        145.0       145.0    NaN    ABC    DEF
4        267.0       267.0    NaN    GHI    KLM
5  P7.991-54.3      P7.991   54.3    GHI    KLM
6   UPM Ex 0.5  UPM Ex 0.5    NaN    PPP    SSS

答案 1 :(得分:0)

使用str.extract

print (df["marker"].str.extract("(?P<Start>\d+\.?\d+?)-?(?P<End>\d+\.?\d+?)?"))

   Start    End
0     45    NaN
1  145.0    100
2  267.0  175.8