我正在尝试分组&为一个列'邻居'分配一个数值,其价值如下:#Queens#Jackson Heights#,#Manhattan#Upper East Side#Sutton Place#,#Brooklyn#Williamsburg#,#Bronx#East Bronx#Throgs Neck#。 (值有2,3个有时4,5个标签) 我使用了正常的if else循环,它对前3个值起作用,如附图所示。 但我不确定它是否正常工作。请帮我分组并为那些组分配值。 [我使用的if else循环如下: *
*# Create a list to store the data
grades = []
# For each row in the column,
for row in new_train1['neighborhood']:
# if more than a value,
if row > '#Queens#':
# Append a num grade
grades.append('1')
# else, if more than a value,
elif row > '#Manhattan#':
# Append a letter grade
grades.append('2')
# else, if more than a value,
elif row > '#Bronx#':
# Append a letter grade
grades.append('3')
# else, if more than a value,
elif row > '#Brooklyn#':
# Append a letter grade
grades.append('4')
# else, if more than a value,
else:
# Append a failing grade
grades.append('0')
答案 0 :(得分:0)
请避免粘贴图像和测试打字技巧。如果我正确理解了您的问题,我会做类似的事情
#creating data frame
df = pd.DataFrame({"A":[1,2,3,4,5], "B":["#Queens#Jackson Heights#", "Manhattan#Upper East Side#Sutton Place#", "Bronx#West East Side#", "Manhattan#Upper East Side#", "#Manhattan#Downtown#Chelsea"]})
#creating replacement dictionary
replace_dic = {"Queens":1, "Jackson Heights":2, "Manhattan":3, "Upper East Side":4, "Sutton Place":5,
"Bronx":6, "West East Side":7, "Downtown":8, 'Chelsea':9}
#replacing
df['C'] = df['B'].str.split("#").apply(lambda x: [replace_dic[i] for i in x if i != ''])
#result
A B C
0 1 #Queens#Jackson Heights# [1, 2]
1 2 Manhattan#Upper East Side#Sutton Place# [3, 4, 5]
2 3 Bronx#West East Side# [6, 7]
3 4 Manhattan#Upper East Side# [3, 4]
4 5 #Manhattan#Downtown#Chelsea [3, 8, 9]
根据您的评论,我认为您正在寻找类似的东西
def replacefunc(x):
x = [i for i in x if i != '']
return replace_dic[x[0]]
df['D'] = df['B'].str.split("#").apply(replacefunc)
答案 1 :(得分:0)
感谢大家的帮助和投入。我通过简单的拆分删除了标签。 &然后用于循环以仅计算每行中的第一个单词。
它给了我期望的输出,但是却是index out of range error
,但是我正在努力。代码如下:
train = pd.DataFrame(train, columns = ['id','listing_type','floor','latitude','longitude','price','beds','baths','total_rooms','square_feet','pet_details','neighborhood'])
# Create a list to store the data
grades = []
# For each row in the column,
for row in train['neighborhood'].str.split('#'):
# if more than a value,
if row[1] == 'Queens':
# Append a num grade
grades.append('1')
# else, if more than a value,
elif row[1] == 'Manhattan':
# Append a letter grade
grades.append('2')
# else, if more than a value,
elif row[1] == 'Bronx':
# Append a letter grade
grades.append('3')
# else, if more than a value,
elif row[1] == 'Brooklyn':
# Append a letter grade
grades.append('4')
# else, if more than a value,
else:
# Append a failing grade
grades.append('0')
`