我有一个示例数据集,
import pandas as pd
df = {
'columA':['1A','ws rank','rank','ws rank','rank','Drank'],
'value': [ 1, 12, 34, 50, 3,2]
}
df = pd.DataFrame(df)
1。我想创建一个列'HP',用于'a rank'和'rank'和'Drank'的columnA行,如果value为1则HP为25,如果value为2则HP为24 ......等等。
所以我首先创建了一个较小的数据集,只包含那些行,因为我的真实数据集非常大。然后,我将连接此数据集和原始数据集以包含“HP”列。但是,当我连接数据集时,会有重复的行。所以必须有一个更简单的方法。
我的代码:
dfrank=df[df["columA"].str.contains('ws rank|rank')]
dfrank['value'] = dfrank['value'].astype(int)
dfrank.loc[dfrank.value == 1, 'HP'] = 25
dfrank.loc[dfrank.value == 2, 'HP'] = 24
dfrank.loc[dfrank.value == 3, 'HP'] = 23
dfrank.loc[dfrank.value == 4, 'HP'] = 22
dfrank.loc[dfrank.value == 5, 'HP'] = 21
dfrank.loc[dfrank.value == 6, 'HP'] = 20
dfrank.loc[dfrank.value == 7, 'HP'] = 19
dfrank.loc[dfrank.value == 8, 'HP'] = 18
dfrank.loc[dfrank.value == 9, 'HP'] = 17
dfrank.loc[dfrank.value == 10, 'HP'] = 16
dfrank.loc[dfrank.value == 11, 'HP'] = 15
dfrank.loc[dfrank.value == 12, 'HP'] = 14
dfrank.loc[dfrank.value == 13, 'HP'] = 13
dfrank.loc[dfrank.value == 14, 'HP'] = 12
dfrank.loc[dfrank.value == 15, 'HP'] = 11
dfrank.loc[dfrank.value == 16, 'HP'] = 10
dfrank.loc[dfrank.value == 17, 'HP'] = 9
dfrank.loc[dfrank.value == 18, 'HP'] = 8
dfrank.loc[dfrank.value == 19, 'HP'] = 7
dfrank.loc[dfrank.value == 20, 'HP'] = 6
dfrank.loc[(dfrank.value > 20)&(dfrank.value <= 50), 'HP'] = 5
df2=pd.concat([df, dfrank])
有没有更简单的方法来做这些条件? 也 我一直收到此错误消息,但我认为我已经在使用它建议的表单 :SettingWithCopyWarning: 尝试在DataFrame的切片副本上设置值。 尝试使用.loc [row_indexer,col_indexer] = value而不是
请参阅文档中的警告:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy dfrank ['value'] = dfrank ['value']。astype(int) H:/Code/PythonScripts/python_work/dataset1.py:20:SettingWithCopyWarning: 尝试在DataFrame的切片副本上设置值。 尝试使用.loc [row_indexer,col_indexer] = value而不是
请参阅文档中的警告:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy dfrank.loc [dfrank.value == 1,'HP'] = 25 C:\ Users \ amywang \ AppData \ Local \ Continuum \ Anaconda3 \ lib \ site-packages \ pandas \ core \ indexing.py:477:SettingWithCopyWarning: 尝试在DataFrame的切片副本上设置值。 尝试使用.loc [row_indexer,col_indexer] = value而不是
请参阅文档中的警告:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy self.obj [item] = s
2。然后我想创建一个'HPpoint'列,对'columA'值进行分组并对'HP'值进行求和,但这不起作用并返回null
df2['HPpoint']=df2.groupby('columA')['HP'].sum()
答案 0 :(得分:1)
...有趣
不确定我是否完全得到了你的所有问题,但这是我对上半场的看法......
import pandas as pd
df = {
'columA':['1A','ws rank','rank','ws rank','rank','Drank'],
'value': [ 1, 12, 34, 50, 3,2]
}
df = pd.DataFrame(df)
df["hp"]=0
def calc_hp(row):
rv=0
if row['columA'] in['ws rank','rank','Drank']:
rv=25-int(row['value'])
return rv
df['hp'] = df.apply(calc_hp,axis=1)
df
返回
columA value hp
0 1A 1 0
1 ws rank 12 13
2 rank 34 -9
3 ws rank 50 -25
4 rank 3 22
5 Drank 2 23
我将整行传递给apply函数,然后使用(希望)你指定的逻辑。
答案 1 :(得分:1)
在Pandas中,当您选择数据并将其存储在新变量中时,索引DataFrame会返回reference to the initial DataFrame
。因此,您应copy
数据帧使用.loc
作为新数据帧,即
dfrank=df[df["columA"].str.contains('ws rank|rank')].copy()
这将创建新索引并帮助您为新数据帧正确编制索引。
由于您想要通过创建dictionary
,a mask
然后.loc
来映射数据,您可以使用{{1}填充Nan值即
fillna
输出:
columA value HP 0 1A 1.0 0.0 1 ws rank 14.0 12.0 2 rank 5.0 21.0 3 ws rank 5.0 21.0 4 rank 23.0 5.0 5 Drank 24.0 5.0 In [ ]:
如果您想使用dicct = {1:25,2:24,3:23,4:22,5:21,6:20,7:19,8:18,9:17,10:16,11:15,12:14,13:13,14:12,15:11,16:10,17:9,18:8,19:7,20:6}
df['HP'] = 0
mask=df["columA"].str.contains('ws rank|rank')
df.loc[mask,'HP'] = df.loc[mask,'value'].map(dicct).fillna(5)
填充新列,可以使用groupby sum
即
transform
输出:
columA value HP HPpoint 0 1A 1.0 0.0 0.0 1 ws rank 14.0 12.0 33.0 2 rank 5.0 21.0 26.0 3 ws rank 5.0 21.0 33.0 4 rank 23.0 5.0 26.0 5 Drank 24.0 5.0 5.0
希望有所帮助