我几乎要写完一个程序,该程序将迭代由两个csv文件组成的csv文件。我被困在最后一列中,该列假定将Damage_done> 700000列为“高”,Damage_done <列为“低”,300000 <= Damage_done <= 699999列为“中”。我尝试编写一个循环并直接分配,但是抛出了以下错误:TypeError:(“ str”和'int'“的实例之间不支持”“>”,“发生在索引0')。>
1。 清晰度质量(行): 如果(row ['damage_done']> 700000): df3 ['dps_quality'] ='高' 如果(row ['damage_done'] <300000): df3 ['dps_quality'] ='低' 如果(300000 <= row ['damage_done'] <= 699999): df3 ['dps_quality'] ='中等'
df3['dps_quality'] = df3.apply(quality, axis = 1)
df3
和 2。
df3['dps_quality'][df3['damage_done'] > 700000] = 'High'
df3['dps_quality'][df3['damage_done'] < 300000] = 'Low'
df3['dps_quality'][300000 <= df3['damage_done'] <= 699000] = 'High'
import pandas as pd
import io
import requests as r
url = 'http://drd.ba.ttu.edu/isqs6339/hw/hw2/'
path = '/Users/jeredwilloughby/Desktop/Business Intelligence/'
file1 = 'players.csv'
file2 = 'player_sessions.csv'
fileout = 'pandashw.csv'
res1 = r.get(url + file1)
res1.status_code
df1 = pd.read_csv(io.StringIO(res1.text), delimiter='|')
df1
res2 = r.get(url + file2)
res2.status_code
df2 = pd.read_csv(io.StringIO(res2.text), delimiter=',')
df2.head(5)
df2.tail(5)
df3 = df1.merge(df2, how="left", on="playerid")
df3.describe()
list(df3)
df3.count()
df3['damage_done'].fillna(0, inplace=True)
df3.count()
df3.to_csv(path + fileout)
def performance(row):
return (row['damage_done']*2.5 + row['healing_done']*4.5)/4
df3['player_performance_metric'] = df3.apply(performance, axis = 1)
df3
df3.to_csv(path + fileout)
def quality(row):
if (row['damage_done'] > 700000):
df3['dps_quality'] = 'High'
if (row['damage_done'] < 300000):
df3['dps_quality'] = 'Low'
if (300000 <= row['damage_done'] <= 699999):
df3['dps_quality'] = 'Medium'
df3['dps_quality'] = df3.apply(quality, axis = 1)
df3
预期:cvs文件输出将有一个标题为'dps_quality'的新列,并带有相关的高,中,低值。
实际:TypeError :(在'str'和'int'“,“发生在索引0'的实例之间不支持”“>)。
答案 0 :(得分:1)
列damage_done
应当包含数字对象(int
或float
),而不是strings
。
方法.apply
为每一行调用函数quality
。
他的函数返回的值将构成该方法将返回的序列。
如代码中所写,此系列将分配到数据框中的列dps_quality
。
因此,无需在函数内使用列名。
考虑到这两个,可能的解决方法是:
def quality(damage_done):
# this line assures that the value will be interpreted as an integer
damage_done = int(damage_done)
if damage_done > 700000:
# now we are returning a value, instead of assigning it directly to the column
return 'High'
if damage_done < 300000:
return 'Low'
# removing the last check as it is not necessary
return 'Medium'
# we are using the .apply method only on a series. This makes the reading easier
df3['dps_quality'] = df3['damage_done'].apply(quality)