我有一个数据框,其中包含一列包含一系列字符串的
books = pd.DataFrame([[1,'In Search of Lost Time'],[2,'Don Quixote'],[3,'Ulysses'],[4,'The Great Gatsby'],[5,'Moby Dick']], columns = ['Book ID', 'Title'])
Book ID Title
0 1 In Search of Lost Time
1 2 Don Quixote
2 3 Ulysses
3 4 The Great Gatsby
4 5 Moby Dick
以及边界的排序列表
boundaries = ['AAAAAAA','The Great Gatsby', 'zzzzzzzz']
我想使用这些边界将数据帧中的值分类为字母箱,类似于pd.cut()
对数字数据的工作方式。我的愿望输出如下所示。
Book ID Title binning
0 1 In Search of Lost Time ['AAAAAAA','The Great Gatsby')
1 2 Don Quixote ['AAAAAAA','The Great Gatsby')
2 3 Ulysses ['The Great Gatsby','zzzzzzzz')
3 4 The Great Gatsby ['The Great Gatsby','zzzzzzzz')
4 5 Moby Dick ['AAAAAAA','The Great Gatsby')
这可能吗?
答案 0 :(得分:5)
boundaries = np.array(['The Great Gatsby'])
bins = np.array(['[A..The Great Gatsby)', '[The Great Gatsby..Z]'])
books.assign(binning=bins[boundaries.searchsorted(books.Title)])
Book ID Title binning
0 1 In Search of Lost Time [A..The Great Gatsby)
1 2 Don Quixote [A..The Great Gatsby)
2 3 Ulysses [The Great Gatsby..Z]
3 4 The Great Gatsby [A..The Great Gatsby)
4 5 Moby Dick [A..The Great Gatsby)
from string import ascii_uppercase as letters
boundaries = np.array([*string.ascii_uppercase[1:-1]])
bins = np.array([f'[{a}..{b})' for a, b in zip(letters, letters[1:])])
books.assign(binning=bins[boundaries.searchsorted(books.Title)])
Book ID Title binning
0 1 In Search of Lost Time [I..J)
1 2 Don Quixote [D..E)
2 3 Ulysses [U..V)
3 4 The Great Gatsby [T..U)
4 5 Moby Dick [M..N)
将此扩展到其他一些边界:
window.scrollTo