我有这个数据框:
+------+--------------+------------+
| ID | Education | Score |
+------+--------------+------------+
| 1 | High School | 7.884 |
| 2 | Bachelors | 6.952 |
| 3 | High School | 8.185 |
| 4 | High School | 6.556 |
| 5 | Bachelors | 6.347 |
| 6 | Master | 6.794 |
+------+--------------+------------+
我想创建一个新列,该列将对得分列进行分类。我想将其标记为:“差”,“好”,“非常好”。
可能看起来像这样:
+------+--------------+------------+------------+
| ID | Education | Score | Labels |
+------+--------------+------------+------------+
| 1 | High School | 7.884 | Good |
| 2 | Bachelors | 6.952 | Bad |
| 3 | High School | 8.185 | Very good |
| 4 | High School | 6.556 | Bad |
| 5 | Bachelors | 6.347 | Bad |
| 6 | Master | 6.794 | Bad |
+------+--------------+------------+------------+
我该怎么做?
预先感谢
答案 0 :(得分:4)
import pandas as pd
# initialize list of lists
data = [[1,'High School',7.884], [2,'Bachelors',6.952], [3,'High School',8.185], [4,'High School',6.556],[5,'Bachelors',6.347],[6,'Master',6.794]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['ID', 'Education', 'Score'])
df['Labels'] = ['Bad' if x<7.000 else 'Good' if 7.000<=x<8.000 else 'Very Good' for x in df['Score']]
df
ID Education Score Labels
0 1 High School 7.884 Good
1 2 Bachelors 6.952 Bad
2 3 High School 8.185 Very Good
3 4 High School 6.556 Bad
4 5 Bachelors 6.347 Bad
5 6 Master 6.794 Bad
答案 1 :(得分:3)
我想这是您想要映射到标签的分数。 您可以定义一个映射函数,将得分作为输入,然后返回标签:
def map_score(score):
if score >= 8:
return "Very good"
elif score >= 7:
return "Good"
else:
return "Bad"
df["Labels"] = df["Score"].apply(lambda score: map_score(score))
答案 2 :(得分:1)
这是我的解决方案。我的目的是避免使用if-else
,并使解决方案更加灵活。
主要思想是用其最大值和最小值创建DataFrame
中的labels
,然后为每个得分值找到正确的标签。
代码:
import pandas as pd
class Label(object):
name = ''
min = 0
max = 100
def __init__(self, name, min, max):
self.name = name
self.min = min
self.max = max
def data(self):
return [self.name, self.min, self.max]
class Labels:
labels = [
Label('Bad', 0, 7).data(),
Label('Good', 7, 8).data(),
Label('Very good', 8, 100).data()]
labels_df = pd.DataFrame(labels, columns=['Label', 'Min', 'Max'])
def get_label(score):
lbs = Labels.labels_df
tlab = lbs[(lbs.Min <= score) & (lbs.Max > score)]
return tlab.Label.values[0]
class edu:
hs = 'High School'
b = 'Bachelors'
m = 'Master'
df = pd.DataFrame({
'ID': range(6),
'Education': [edu.hs, edu.b, edu.hs, edu.hs, edu.b, edu.m],
'Score': [7.884, 6.952, 8.185, 6.556, 6.347, 6.794]})
df['Label'] = df.apply(lambda row: Labels.get_label(row['Score']), axis=1)
print(df)
输出:
ID Education Score Label
0 0 High School 7.884 Good
1 1 Bachelors 6.952 Bad
2 2 High School 8.185 Very good
3 3 High School 6.556 Bad
4 4 Bachelors 6.347 Bad
5 5 Master 6.794 Bad