尝试创建一个新的数据帧,首先将原始数据分成两部分:
df1 - 仅包含原始帧中的行,其中所选colomn中的行具有给定列表中的值
df2 - 仅包含原始行,在选定的colomn中有其他值,然后将这些值更改为新的给定值。
将新数据帧作为df1和df2
的串联返回这很好用:
l1 = ['a','b','c','d','a','b']
l2 = [1,2,3,4,5,6]
df = pd.DataFrame({'cat':l1,'val':l2})
print(df)
cat val
0 a 1
1 b 2
2 c 3
3 d 4
4 a 5
5 b 6
df['cat'] = df['cat'].apply(lambda x: 'other')
print(df)
cat val
0 other 1
1 other 2
2 other 3
3 other 4
4 other 5
5 other 6
然而,当我定义函数时:
def create_df(df, select, vals, other):
df1 = df.loc[df[select].isin(vals)]
df2 = df.loc[~df[select].isin(vals)]
df2[select] = df2[select].apply(lambda x: other)
result = pd.concat([df1, df2])
return result
并称之为:
df3 = create_df(df, 'cat', ['a','b'], 'xxx')
print(df3)
这导致了我真正需要的东西:
cat val
0 a 1
1 b 2
4 a 5
5 b 6
2 xxx 3
3 xxx 4
出于某种原因,在这种情况下,我收到警告:
..\usr\conda\lib\site-packages\ipykernel\__main__.py:10: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
那么当我在函数中分配值时,这种情况(当我为函数中的列赋值时)与第一种情况有何不同?
更改列值的正确方法是什么?
答案 0 :(得分:0)
有很多方法可以优化代码,但是为了使它工作,你可以简单地保存输入数据帧的副本并连接它们:
def create_df(df, select, vals, other):
df1 = df.copy()[df[select].isin(vals)] #boolean.index
df2 = df.copy()[~df[select].isin(vals)] #boolean-index
df2[select] = other # this is sufficient
result = pd.concat([df1, df2])
return result
替代版本:
l1 = ['a','b','c','d','a','b']
l2 = [1,2,3,4,5,6]
df = pd.DataFrame({'cat':l1,'val':l2})
# define a mask
mask = df['cat'].isin(list("ab"))
# concatenate mask, nonmask
df2 = pd.concat([df[mask],df[-mask]])
# change values to 'xxx'
df2.loc[-mask,["cat"]] = "xxx"
输出
cat val
0 a 1
1 b 2
4 a 5
5 b 6
2 xxx 3
3 xxx 4
或功能:
def create_df(df, filter_, isin_, value):
# define a mask
mask = df[filter_].isin(isin_)
# concatenate mask, nonmask
df = pd.concat([df[mask],df[-mask]])
# change values to 'xxx'
df.loc[-mask,[filter_]] = value
return df
df2 = create_df(df, 'cat', ['a','b'], 'xxx')
df2