KeyError:“列名”

时间:2018-08-20 10:27:59

标签: python pandas csv

我想对每一行进行操作并将其放在新列中

我有col“喜欢”和col“不喜欢”,所以我想创建“比率” col(新) 我是从StackOverflow上获取的,但这不起作用

for index, row in data.iterrows(): 
    if row['dislikes'] > 0:
        data["ratio"][index] =  data.likes[index]/data.dislikes[index]

我要避免除以零,因此,如果“不喜欢”大于零,则执行类似/不喜欢的公式

1 个答案:

答案 0 :(得分:4)

我认为在熊猫中最好避免循环,因为如果存在矢量化解决方案,速度会很慢:

mask = data['dislikes'] > 0
data.loc[mask, 'ratio']  =  data.loc[mask, 'likes'] / data.loc[mask, 'dislikes'] 

或者:

data["ratio"] = np.where(mask, data['likes'] / data['dislikes'], np.nan)

编辑:

我认为NaN应该更改为0

data = pd.DataFrame({'likes':[10,20,0,0], 'dislikes':[5,0,10,0]})

mask = data['dislikes'] > 0
data["ratio"] = np.where(mask, data['likes'] / data['dislikes'], 0)
print (data)
   likes  dislikes  ratio
0     10         5    2.0
1     20         0    0.0
2      0        10    0.0
3      0         0    0.0

编辑:

data = pd.DataFrame({'likes':[10,20,0,0], 'dislikes':[5,0,10,0]})

通过2个不同的列过滤DataFrame:

a = data.loc[data.likes > 0, 'likes']
b = data.loc[data.dislikes > 0, 'dislikes']
print (a)
0    10
1    20 <-different index 1
Name: likes, dtype: int64

print (b)
0     5
2    10 <-different index 2
Name: dislikes, dtype: int64

如果要除以不同的索引,则会得到NaN,因为熊猫会尝试对齐数据:

c = a/b
print (c)
0    2.0
1    NaN
2    NaN
dtype: float64

如果创建新列-为3中不存在的索引c添加新列,数据也将对齐:

data['ratio'] = c
print (data)
   likes  dislikes  ratio
0     10         5    2.0
1     20         0    NaN
2      0        10    NaN
3      0         0    NaN