Pandas条件创建列问题

时间:2017-08-21 22:56:00

标签: python pandas

我有一个示例数据集,

import pandas as pd

df = {
  'columA':['1A','ws rank','rank','ws rank','rank','Drank'],
 'value': [ 1, 12, 34, 50, 3,2]
}


df = pd.DataFrame(df)

1。我想创建一个列'HP',用于'a rank'和'rank'和'Drank'的columnA行,如果value为1则HP为25,如果value为2则HP为24 ......等等。
所以我首先创建了一个较小的数据集,只包含那些行,因为我的真实数据集非常大。然后,我将连接此数据集和原始数据集以包含“HP”列。但是,当我连接数据集时,会有重复的行。所以必须有一个更简单的方法。

我的代码:

dfrank=df[df["columA"].str.contains('ws rank|rank')]
dfrank['value'] = dfrank['value'].astype(int)
dfrank.loc[dfrank.value == 1, 'HP'] = 25
dfrank.loc[dfrank.value == 2, 'HP'] = 24
dfrank.loc[dfrank.value == 3, 'HP'] = 23
dfrank.loc[dfrank.value == 4, 'HP'] = 22
dfrank.loc[dfrank.value == 5, 'HP'] = 21
dfrank.loc[dfrank.value == 6, 'HP'] = 20
dfrank.loc[dfrank.value == 7, 'HP'] = 19
dfrank.loc[dfrank.value == 8, 'HP'] = 18
dfrank.loc[dfrank.value == 9, 'HP'] = 17
dfrank.loc[dfrank.value == 10, 'HP'] = 16
dfrank.loc[dfrank.value == 11, 'HP'] = 15
dfrank.loc[dfrank.value == 12, 'HP'] = 14
dfrank.loc[dfrank.value == 13, 'HP'] = 13
dfrank.loc[dfrank.value == 14, 'HP'] = 12
dfrank.loc[dfrank.value == 15, 'HP'] = 11
dfrank.loc[dfrank.value == 16, 'HP'] = 10
dfrank.loc[dfrank.value == 17, 'HP'] = 9
dfrank.loc[dfrank.value == 18, 'HP'] = 8
dfrank.loc[dfrank.value == 19, 'HP'] = 7
dfrank.loc[dfrank.value == 20, 'HP'] = 6
dfrank.loc[(dfrank.value > 20)&(dfrank.value <= 50), 'HP'] = 5

df2=pd.concat([df, dfrank])

有没有更简单的方法来做这些条件? 也 我一直收到此错误消息,但我认为我已经在使用它建议的表单 :SettingWithCopyWarning: 尝试在DataFrame的切片副本上设置值。 尝试使用.loc [row_indexer,col_indexer] = value而不是

请参阅文档中的警告:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy   dfrank ['value'] = dfrank ['value']。astype(int) H:/Code/PythonScripts/python_work/dataset1.py:20:SettingWithCopyWarning: 尝试在DataFrame的切片副本上设置值。 尝试使用.loc [row_indexer,col_indexer] = value而不是

请参阅文档中的警告:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy   dfrank.loc [dfrank.value == 1,'HP'] = 25 C:\ Users \ amywang \ AppData \ Local \ Continuum \ Anaconda3 \ lib \ site-packages \ pandas \ core \ indexing.py:477:SettingWithCopyWarning: 尝试在DataFrame的切片副本上设置值。 尝试使用.loc [row_indexer,col_indexer] = value而不是

请参阅文档中的警告:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy   self.obj [item] = s

2。然后我想创建一个'HPpoint'列,对'columA'值进行分组并对'HP'值进行求和,但这不起作用并返回null

df2['HPpoint']=df2.groupby('columA')['HP'].sum()

2 个答案:

答案 0 :(得分:1)

...有趣

不确定我是否完全得到了你的所有问题,但这是我对上半场的看法......

import pandas as pd
df = {
  'columA':['1A','ws rank','rank','ws rank','rank','Drank'],
 'value': [ 1, 12, 34, 50, 3,2]
}

df = pd.DataFrame(df)

df["hp"]=0

def calc_hp(row):

    rv=0
    if row['columA'] in['ws rank','rank','Drank']:
        rv=25-int(row['value'])        
    return rv

df['hp'] = df.apply(calc_hp,axis=1)

df

返回

columA  value   hp
0   1A  1   0
1   ws rank 12  13
2   rank    34  -9
3   ws rank 50  -25
4   rank    3   22
5   Drank   2   23

我将整行传递给apply函数,然后使用(希望)你指定的逻辑。

答案 1 :(得分:1)

在Pandas中,当您选择数据并将其存储在新变量中时,索引DataFrame会返回reference to the initial DataFrame。因此,您应copy数据帧使用.loc作为新数据帧,即

dfrank=df[df["columA"].str.contains('ws rank|rank')].copy()

这将创建新索引并帮助您为新数据帧正确编制索引。

由于您想要通过创建dictionarya mask然后.loc来映射数据,您可以使用{{1}填充Nan值即

fillna

输出:

    columA  value    HP
0       1A    1.0   0.0
1  ws rank   14.0  12.0
2     rank    5.0  21.0
3  ws rank    5.0  21.0
4     rank   23.0   5.0
5    Drank   24.0   5.0
In [ ]:

如果您想使用dicct = {1:25,2:24,3:23,4:22,5:21,6:20,7:19,8:18,9:17,10:16,11:15,12:14,13:13,14:12,15:11,16:10,17:9,18:8,19:7,20:6} df['HP'] = 0 mask=df["columA"].str.contains('ws rank|rank') df.loc[mask,'HP'] = df.loc[mask,'value'].map(dicct).fillna(5) 填充新列,可以使用groupby sum

transform

输出:

    columA  value    HP  HPpoint
0       1A    1.0   0.0      0.0
1  ws rank   14.0  12.0     33.0
2     rank    5.0  21.0     26.0
3  ws rank    5.0  21.0     33.0
4     rank   23.0   5.0     26.0
5    Drank   24.0   5.0      5.0

希望有所帮助