我无法使用print
或histogram
打印特殊字符。
def class_data():
df = pd.read_csv('words.csv', sep=',')
df = df.astype(str)
df = df.replace(['é', 'è', 'È', 'É'], 'e', regex=True)
df = df.replace(['à', 'â', 'À'], 'a', regex=True)
df.manual_raw_value = df.manual_raw_value.str.lower()
classes=set(df.manual_raw_value.apply(list).sum())
print("number of classes is ", len(classes))
print("classes are " ,classes)
# histogram
pd.Series(list(df.manual_raw_value.str.cat())).value_counts().plot(kind="bar")
我得
('number of classes is ', 73)
和班级:
('classes are ', set(['\x82', '\x87', '*', '\xac', '\xaf', '\xae', '>', '!', ' ', '"', '%', "'", '\xb0', ')', '(', '+', '\xaa', '-', ',', '/', '.', '1', '0', '3', '2', '5', '4', '7', '6', '9', '8', '\xbb', ':', '=', '?', '\xb4', '@', '\xc3', '\xc2', '\xa7', '\xa1', '\xb9', '\xe2', '_', 'a', '&', 'c', 'b', 'e', 'd', 'g', 'f', 'i', 'h', 'k', 'j', 'm', 'l', 'o', 'n', 'q', 'p', 's', 'r', 'u', 't', 'w', 'v', 'y', 'x', 'z', '\xab', '\x94']))
这是直方图
为什么我会在直方图中得到?
时获得special char
?
与
print("classes are " ,classes) l get for `special chars`
'\xab', '\x94' , how to display the appropriate char ? is it related to encoding