我想列出特定列中存在的每个标签/字符串的名称。这样的标签将在列中多次出现(例如,机队,旅行等):例如
Column1 Column2
Facility Machine
Fleet Other
Travel Leased Vehicles
...... .......
如何编写代码以提取numpy数组中的标签?
谢谢。
所需的输出 例如。 feature_labels = np.array([''Column1_Facility','Column1_Fleet','Column2_Machine'等
答案 0 :(得分:0)
我不确定我是否完全理解这个问题,但这是我的尝试:
df = pd.DataFrame({'Column1': ['Facility', 'Fleet', 'Travel'], 'Column2': ['Machine', 'Other', 'Leased Vehicles']})
df
#Outputs:
Column1 Column2
0 Facility Machine
1 Fleet Other
2 Travel Leased Vehicles
然后遍历各列,以根据需要将列名称附加到功能名称:
for col in df.columns:
df[col] = df[col].apply(lambda x: f'{col}_{x}')
以上内容将为您提供:
Column1 Column2
0 Column1_Facility Column2_Machine
1 Column1_Fleet Column2_Other
2 Column1_Travel Column2_Leased Vehicles
现在您可以简单地提取每列的值:
df.Column1.values
结果:
array(['Column1_Facility','Column1_Fleet','Column1_Travel'], dtype = object)
编辑:
如果您只想在列中列出唯一值:
Column1 Column2
0 Column1_Facility Column2_Machine
1 Column1_Fleet Column2_Other
2 Column1_Travel Column2_Leased Vehicles
3 Column1_Facility Column2_Machine
您需要使用:
df.Column1.unique()
结果:
array(['Column1_Facility','Column1_Fleet','Column1_Travel'], dtype = object)
答案 1 :(得分:0)
numpy
具有char
模块,用于准矢量化字符串操作。例如,您可以使用np.char.add
:
import functools as ft
data
# array([['Column1', 'Column2'],
# ['Facility', 'Machine'],
# ['Fleet', 'Other'],
# ['Travel', 'Leased Vehicles'],
# ['......', '.......']], dtype='<U15')
ft.reduce(np.char.add, (data[:1], '_', data[1:]))
# array([['Column1_Facility', 'Column2_Machine'],
# ['Column1_Fleet', 'Column2_Other'],
# ['Column1_Travel', 'Column2_Leased Vehicles'],
# ['Column1_......', 'Column2_.......']], dtype='<U31')