Python Pandas数据框基于清除字符串值并分配给新列的函数来修改列值

时间:2020-03-11 08:07:19

标签: python pandas dataframe

我要清除某些数据,其中一些键要删除六个前导零,并且如果这些键不是以“ ABC”结尾或不是以“ DEFG”结尾,则我需要清除最后3个索引中的货币代码。如果密钥不是以前导零开头,则只需按原样返回密钥即可。

为此,我编写了一个处理字符串的函数,如下所示:

def cleanAttainKey(dirtyAttainKey):

    if dirtyAttainKey[0] != "0":
        return dirtyAttainKey
    else:
        dirtyAttainKey = dirtyAttainKey.strip("0")

    if dirtyAttainKey[-3:] != "ABC" and dirtyAttainKey[-3:] != "DEFG":
        dirtyAttainKey =  dirtyAttainKey[:-3]
    cleanAttainKey = dirtyAttainKey
    return cleanAttainKey

现在,我建立一个虚拟数据帧对其进行测试,但是它报告了错误:

  1. 数据框
df = pd.DataFrame({'dirtyKey':["00000012345ABC","0000012345DEFG","0000023456DEFGUSD"],'amount':[100,101,102]},
                  columns=["dirtyKey","amount"])
  1. 我需要在df中获得一个名为“ cleanAttainKey”的新列,然后使用“ cleanAttainKey”功能修改“ dirtyKey”中的每个值,然后将已清理的键分配给新列“ cleanAttainKey”,但是熊猫不支持这种修改。
# add a new column in df called cleanAttainKey
df['cleanAttainKey'] = ""
# I want to clean the keys and get into the new column of cleanAttainKey
dirtyAttainKeyList = df['dirtyKey'].tolist()
for i in range(len(df['cleanAttainKey'])):
    df['cleanAttainKey'][i] = cleanAttainKey(vpAttainKeyList[i])

我收到以下错误消息:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

结果应与以下df2相同:

df2 = pd.DataFrame({'dirtyKey':["00000012345ABC","0000012345DEFG","0000023456DEFGUSD"],'amount':[100,101,102],
                  'cleanAttainKey':["12345ABC","12345DEFG","23456DEFG"]},
                  columns=["dirtyKey","cleanAttainKey","amount"])
df2

在Pandas中,还有更好的方法来修改脏键并使用干净键获取新列吗? 谢谢

1 个答案:

答案 0 :(得分:1)

这是罪魁祸首:

df['cleanAttainKey'][i] = cleanAttainKey(vpAttainKeyList[i])

使用数据框的提取时,Pandas保留选择制作副本或视图的功能。仅读取数据并不重要,但这意味着您永远不要修改它。

惯用的方法是使用loc(或iloc[i]at):

df.loc[i, 'cleanAttainKey'] = cleanAttainKey(vpAttainKeyList[i])

(以上假设自然范围指标...)