使用PHP或Python按计数对字符进行排序

时间:2017-10-13 16:06:14

标签: php python

我有一串字符

abcdefghijklmnopqrstuvwxyz_

我想取这个字符串并按它们出现在一大块字符中的次数进行排序。例如:

cwrxwzbgickpjbp_svnudntddwdqbfgzyiqpuxddmpvyfquosmicfzkjekxzchngpqaksafulateukuwomdrwza_n_ptzktjzcuibnebe_tqessrzqewgkadrkvtyznaupodanwazopg_fijcoojojbsolr_ejesukzc_quochdnmti_lkvrsegyieqlqysuxdvetkqtkhxaiypfdiddztlicjurnllriopdtuuzpryrsepfydyeg_xkr_ruxp_lgqesysidfsygztwrba_ay_gaqqklbrvr_lbhawjraqujfxptmuvqfzklfodgaqrnhjravksjwemoosdlxtvw_qspxmlvqryusfixzlkb_p_c_tepzozzwnokvqspkizygoqpbhjnsxopchzgapctowbrletrunlgnvzpfwrqgedo_s_ygkxz_mpncnve_gfpbotupawevhfxvqhwlerupjfibosbvhiijrodigzyhy_iijes_xsqorshhdzkjqitpljsftpitjetwmzqiabyiewgtbjaddtsjkckcxxvlyrchloetluxkohn_uihkdjpcqgvejanslakmwendgkmvmayknvjjnr_kdapnumwvz__lsimxdtrflyleykxejl_jbkhexpcyreoapelqzzyriyrbxdgbgwrrxlj_pt_mpwubvbveakxfsbfgj___

一旦我对这些字符进行了排序,我还想删除包含下划线的所有字符。

递归是我需要在这里看到的正确想法吗?

修改

可输出的示例:

afiskjweocnsdkspwjrhfg

基本上,字符将根据频率在一行中进行排序。

2 个答案:

答案 0 :(得分:1)

<?php

$text = 'ahugechunkofatext';
$charCounts = count_chars($text, 1);
arsort($charCounts);

$chars = array_map('chr', array_keys($charCounts));
$chars = array_filter($chars, function ($char) {
    return !in_array($char, ['_']); // A list of chars that you don't want
});

echo implode('', $chars) . PHP_EOL;

答案 1 :(得分:0)

您可以使用collections.Counter来计算大字符串中的字符数:

import collections
walloftext = """cwrxwzbgickpjbp_svnudntddwdqbfgzyiqpuxddmpvyfquosmicfzkjekxzchngpqaksafulateukuwomdrwza_n_ptzktjzcuibnebe_tqessrzqewgkadrkvtyznaupodanwazopg_fijcoojojbsolr_ejesukzc_quochdnmti_lkvrsegyieqlqysuxdvetkqtkhxaiypfdiddztlicjurnllriopdtuuzpryrsepfydyeg_xkr_ruxp_lgqesysidfsygztwrba_ay_gaqqklbrvr_lbhawjraqujfxptmuvqfzklfodgaqrnhjravksjwemoosdlxtvw_qspxmlvqryusfixzlkb_p_c_tepzozzwnokvqspkizygoqpbhjnsxopchzgapctowbrletrunlgnvzpfwrqgedo_s_ygkxz_mpncnve_gfpbotupawevhfxvqhwlerupjfibosbvhiijrodigzyhy_iijes_xsqorshhdzkjqitpljsftpitjetwmzqiabyiewgtbjaddtsjkckcxxvlyrchloetluxkohn_uihkdjpcqgvejanslakmwendgkmvmayknvjjnr_kdapnumwvz__lsimxdtrflyleykxejl_jbkhexpcyreoapelqzzyriyrbxdgbgwrrxlj_pt_mpwubvbveakxfsbfgj___"""
wallcount = collections.Counter(walloftext)

然后使用这些计数对原始字母进行排序:

alphabet = "abcdefghijklmnopqrstuvwxyz_"
sortedalph = sorted(alphabet, key=lambda c: wallcount[c])

(这通过增加频率来排序:结果首先是最不频繁的字母。如果你想要它反过来,在lambda中-之前抛出wallcount。)

最后,将排序后的字母表连接回一个字符串,然后切断下划线及其后的所有内容:

finalalph = "".join(sortedalph).split("_")[0]