我有一串字符
abcdefghijklmnopqrstuvwxyz_
我想取这个字符串并按它们出现在一大块字符中的次数进行排序。例如:
cwrxwzbgickpjbp_svnudntddwdqbfgzyiqpuxddmpvyfquosmicfzkjekxzchngpqaksafulateukuwomdrwza_n_ptzktjzcuibnebe_tqessrzqewgkadrkvtyznaupodanwazopg_fijcoojojbsolr_ejesukzc_quochdnmti_lkvrsegyieqlqysuxdvetkqtkhxaiypfdiddztlicjurnllriopdtuuzpryrsepfydyeg_xkr_ruxp_lgqesysidfsygztwrba_ay_gaqqklbrvr_lbhawjraqujfxptmuvqfzklfodgaqrnhjravksjwemoosdlxtvw_qspxmlvqryusfixzlkb_p_c_tepzozzwnokvqspkizygoqpbhjnsxopchzgapctowbrletrunlgnvzpfwrqgedo_s_ygkxz_mpncnve_gfpbotupawevhfxvqhwlerupjfibosbvhiijrodigzyhy_iijes_xsqorshhdzkjqitpljsftpitjetwmzqiabyiewgtbjaddtsjkckcxxvlyrchloetluxkohn_uihkdjpcqgvejanslakmwendgkmvmayknvjjnr_kdapnumwvz__lsimxdtrflyleykxejl_jbkhexpcyreoapelqzzyriyrbxdgbgwrrxlj_pt_mpwubvbveakxfsbfgj___
一旦我对这些字符进行了排序,我还想删除包含下划线的所有字符。
递归是我需要在这里看到的正确想法吗?
修改
可输出的示例:
afiskjweocnsdkspwjrhfg
基本上,字符将根据频率在一行中进行排序。
答案 0 :(得分:1)
<?php
$text = 'ahugechunkofatext';
$charCounts = count_chars($text, 1);
arsort($charCounts);
$chars = array_map('chr', array_keys($charCounts));
$chars = array_filter($chars, function ($char) {
return !in_array($char, ['_']); // A list of chars that you don't want
});
echo implode('', $chars) . PHP_EOL;
答案 1 :(得分:0)
您可以使用collections.Counter
来计算大字符串中的字符数:
import collections
walloftext = """cwrxwzbgickpjbp_svnudntddwdqbfgzyiqpuxddmpvyfquosmicfzkjekxzchngpqaksafulateukuwomdrwza_n_ptzktjzcuibnebe_tqessrzqewgkadrkvtyznaupodanwazopg_fijcoojojbsolr_ejesukzc_quochdnmti_lkvrsegyieqlqysuxdvetkqtkhxaiypfdiddztlicjurnllriopdtuuzpryrsepfydyeg_xkr_ruxp_lgqesysidfsygztwrba_ay_gaqqklbrvr_lbhawjraqujfxptmuvqfzklfodgaqrnhjravksjwemoosdlxtvw_qspxmlvqryusfixzlkb_p_c_tepzozzwnokvqspkizygoqpbhjnsxopchzgapctowbrletrunlgnvzpfwrqgedo_s_ygkxz_mpncnve_gfpbotupawevhfxvqhwlerupjfibosbvhiijrodigzyhy_iijes_xsqorshhdzkjqitpljsftpitjetwmzqiabyiewgtbjaddtsjkckcxxvlyrchloetluxkohn_uihkdjpcqgvejanslakmwendgkmvmayknvjjnr_kdapnumwvz__lsimxdtrflyleykxejl_jbkhexpcyreoapelqzzyriyrbxdgbgwrrxlj_pt_mpwubvbveakxfsbfgj___"""
wallcount = collections.Counter(walloftext)
然后使用这些计数对原始字母进行排序:
alphabet = "abcdefghijklmnopqrstuvwxyz_"
sortedalph = sorted(alphabet, key=lambda c: wallcount[c])
(这通过增加频率来排序:结果首先是最不频繁的字母。如果你想要它反过来,在lambda中-
之前抛出wallcount
。)
最后,将排序后的字母表连接回一个字符串,然后切断下划线及其后的所有内容:
finalalph = "".join(sortedalph).split("_")[0]