在Python中创建一个压缩函数?

时间:2015-09-30 00:20:24

标签: python compression

我需要创建一个名为compress的函数,通过用字母和数字替换任何重复的字母来压缩字符串。我的函数应该返回缩短版本的字符串。我已经算出了第一个角色而不是其他角色。

例如:

>>> compress("ddaaaff")
'd2a3f2'


 def compress(s):
     count=0

     for i in range(0,len(s)):
         if s[i] == s[i-1]:
             count += 1
         c = s.count(s[i])

     return str(s[i]) + str(c)

22 个答案:

答案 0 :(得分:6)

这是一个压缩函数的简短python实现:

def compress(string):

    res = ""

    count = 1

    #Add in first character
    res += string[0]

    #Iterate through loop, skipping last one
    for i in range(len(string)-1):
        if(string[i] == string[i+1]):
            count+=1
        else:
            if(count > 1):
                #Ignore if no repeats
                res += str(count)
            res += string[i+1]
            count = 1
    #print last one
    if(count > 1):
        res += str(count)
    return res

以下是一些例子:

>>> compress("ddaaaff")
'd2a3f2'
>>> compress("daaaafffyy")
'da4f3y2'
>>> compress("mississippi")
'mis2is2ip2i'

答案 1 :(得分:3)

带发电机的短版本:

from itertools import groupby
def compress(string):
    return ''.join('%s%s' % (char, sum(1 for _ in group)) for char, group in groupby(string)).replace('1', '')

(1)使用groupby(string)

按字符分组

(2)使用sum(1 for _ in group)计算群组的长度(因为群组上没有len

(3)加入适当的格式

(4)删除单个项目的1个字符

答案 2 :(得分:2)

为什么这不起作用有几个原因。你真的需要先尝试自己调试一下。输入一些print语句来跟踪执行情况。例如:

def compress(s):
    count=0

    for i in range(0, len(s)):
        print "Checking character", i, s[i]
        if s[i] == s[i-1]:
            count += 1
        c = s.count(s[i])
        print "Found", s[i], c, "times"

    return str(s[i]) + str(c)

print compress("ddaaaff")

这是输出:

Checking character 0 d
Found d 2 times
Checking character 1 d
Found d 2 times
Checking character 2 a
Found a 3 times
Checking character 3 a
Found a 3 times
Checking character 4 a
Found a 3 times
Checking character 5 f
Found f 2 times
Checking character 6 f
Found f 2 times
f2

Process finished with exit code 0

(1)你扔掉了除了最后一个字母的搜索之外的所有结果。 (2)计算所有事件,而不仅仅是连续事件。 (3)您将字符串转换为字符串 - 冗余。

尝试使用铅笔和纸张完成此示例。写下作为人类使用的步骤来解析字符串。努力将它们翻译成Python。

答案 3 :(得分:1)

     Time         Type  Station            Log_time
1   08:40:01.7462 UNIT_RESULT SK190400      272
2   08:40:11.6187 UNIT_CHECKI SK190400      257
3   08:40:18.1471 UNIT_RESULT SK190400      306
4   08:40:27.9737 UNIT_CHECKI SK190400      224

答案 4 :(得分:1)

执行此操作的另一种最简单的方法:

def compress(str1):
    output = ''
    initial = str1[0]
    output = output + initial
    count = 1
    for item in str1[1:]:
        if item == initial:
            count = count + 1
        else:
            if count == 1:
                count = ''
            output = output + str(count)
            count = 1
            initial = item
            output = output + item
    print (output)

根据需要给出输出,例如:

>> compress("aaaaaaaccddddeehhyiiiuuo")
a7c2d4e2h2yi3u2o

>> compress("lllhhjuuuirrdtt")
l3h2ju3ir2dt

>> compress("mississippi")
mis2is2ip2i

答案 5 :(得分:0)

from collections import Counter

def char_count(input_str):
    my_dict = Counter(input_str)
    print(my_dict)
    output_str = ""
    for i in input_str:
        if i not in output_str:
            output_str += i
            output_str += str(my_dict[i])
    return output_str

result = char_count("zddaaaffccc")
print(result)

答案 6 :(得分:0)

- name: "Teili e zaga"
  shell: "{{ item }}"
  with_items:    
   - $(aws ecr get-login --no-include-email --region us-east-1)

答案 7 :(得分:0)

使用生成器:

input = "aaaddddffwwqqaattttttteeeeeee"

from itertools import groupby
print(''.join(([char+str(len(list(group))) for char, group in groupby(input)])))

答案 8 :(得分:0)

这是Patrick Yu代码的修改。他的代码在以下测试用例中失败。

样品输入:
c
aaaaaaaaaabcabcdefgh

预期的输出:
c1
a10b1c1d1e1f1g1h1

帕特里克代码的输出:
c
a10bcdefgh

以下是他的代码的修改:


def Compress(S):
    Ans = S[0]
    count = 1
    for i in range(len(S)-1):
        if S[i] == S[i+1]:
            count += 1
        else:
            if count >= 1:
                Ans += str(count)
            Ans += S[i+1]
            count = 1
    if count>=1:
        Ans += str(count)
    return Ans

在将计数 1进行比较时,必须将条件从更大(“>”)更改为等于(“> =”)

答案 9 :(得分:0)

使用python的标准库RecordID | Policy | Benefit | CBBR | IBBR ---------+--------+---------+-------+------- 1 | 12345 | A | $1.34 | $5.64 2 | 12345 | B | $4.56 | $0.56 3 | 12345 | C | $5.67 | $3.32 4 | 54321 | A | $2.57 | $6.24 5 | 34512 | A | $1.76 | $3.32 6 | 34512 | A | $4.56 | $1.34

re
def compress(string):
    import re
    p=r'(\w+?)\1+' # non greedy, group1 1
    sub_str=string
    for m in re.finditer(p,string):
        num=m[0].count(m[1])
        sub_str=re.sub(m[0],f'{m[1]}{num}',sub_str)
    return sub_str

恢复:

string='aaaaaaaabbbbbbbbbcccccccckkkkkkkkkkkppp'
string2='ababcdcd'
string3='abcdabcd' 
string4='ababcdabcdefabcdcd' 

print(compress(string))
print(compress(string2))
print(compress(string3))
print(compress(string4))

答案 10 :(得分:0)

error: bad escape \p

答案 11 :(得分:0)

对于编码面试,它是关于算法的,而不是我对Python的了解,它对数据结构的内部表示或诸如字符串连接之类的操作的时间复杂性:

def compress(message: str) -> str:
    output = ""
    length = 0
    previous: str = None
    for char in message:
        if previous is None or char == previous:
            length += 1
        else:
            output += previous
            if length > 1:
                output += str(length)
            length = 1
        previous = char
    if previous is not None:
        output += previous
        if length > 1:
            output += str(length)
    return output

对于我实际上将在生产中使用的代码,不要重塑任何轮子,进行更多测试,使用迭代器直到提高空间效率的最后一步,并使用join()而不是字符串连接来提高时间效率:

from itertools import groupby
from typing import Iterator


def compressed_groups(message: str) -> Iterator[str]:
    for char, group in groupby(message):
        length = sum(1 for _ in group)
        yield char + (str(length) if length > 1 else "")


def compress(message: str) -> str:
    return "".join(compressed_groups(message))

更进一步,以提高可测试性:

from itertools import groupby
from typing import Iterator
from collections import namedtuple


class Segment(namedtuple('Segment', ['char', 'length'])):

    def __str__(self) -> str:
        return self.char + (str(self.length) if self.length > 1 else "")


def segments(message: str) -> Iterator[Segment]:
    for char, group in groupby(message):
        yield Segment(char, sum(1 for _ in group))


def compress(message: str) -> str:
    return "".join(str(s) for s in segments(message))

全力以赴并提供价值对象CompressedString

from itertools import groupby
from typing import Iterator
from collections import namedtuple


class Segment(namedtuple('Segment', ['char', 'length'])):

    def __str__(self) -> str:
        return self.char + (str(self.length) if self.length > 1 else "")


class CompressedString(str):

    @classmethod
    def compress(cls, message: str) -> "CompressedString":
        return cls("".join(str(s) for s in cls._segments(message)))

    @staticmethod
    def _segments(message: str) -> Iterator[Segment]:
        for char, group in groupby(message):
            yield Segment(char, sum(1 for _ in group))


def compress(message: str) -> str:
    return CompressedString.compress(message)

答案 12 :(得分:0)

from collections import Counter
def string_compression(string):
    counter = Counter(string)
    result = ''
    for k, v in counter.items():
        result = result + k + str(v)
    print(result)

答案 13 :(得分:0)

string ='aabccccd' 输出='2a3b4c4d'

new_string = " "
count = 1
for i in range(len(string)-1):
    if string[i] == string[i+1]:
        count = count + 1
    else:         
        new_string =  new_string + str(count) + string[i]
        count = 1 
new_string = new_string + str(count) + string[i+1]    
print(new_string)

答案 14 :(得分:0)

我想通过分割字符串来做到这一点。 因此aabbcc会变成:['aa','bb','cc']

这就是我的做法:

    class HoursTableViewController: UIViewController, UITableViewDelegate, UITableViewDataSource {

    var hourDonationSegmentedButton: UISegmentedControl!
    var addEntryButton: UIButton!
    var editEntryButton: UIButton!

    var hoursEntries = [String]()
    var hoursTableView: UITableView = HoursTableView()




    override func viewDidLoad() {
        super.viewDidLoad()
        hoursTableView.dataSource = self
        hoursTableView.delegate = self
        hoursTableView.register(HoursTableViewCell.self, forCellReuseIdentifier: "HoursTableViewCell")
        prepareTableView()
    }

    func prepareTableView() {
        view.addSubview(hoursTableView)
        hoursTableView.backgroundColor = .green
        hoursTableView.translatesAutoresizingMaskIntoConstraints = false
        hoursTableView.topAnchor.constraint(equalTo: view.topAnchor).isActive = true
        hoursTableView.leftAnchor.constraint(equalTo: view.leftAnchor).isActive = true
        hoursTableView.rightAnchor.constraint(equalTo: view.rightAnchor).isActive = true
        hoursTableView.rowHeight = UITableView.automaticDimension
        hoursTableView.estimatedRowHeight = 44
        hoursTableView.frame = CGRect(x: 0, y: 0, width: self.view.frame.size.width, height: self.view.frame.size.height)
        hoursTableView.backgroundColor = .green
    }


}

extension HoursTableViewController {

    func tableView(_ tableView: UITableView, numberOfRowsInSection section: Int) -> Int {
        return 1;

    }

    func tableView(_ tableView: UITableView, cellForRowAt indexPath: IndexPath) -> UITableViewCell {

        var cell = hoursTableView.dequeueReusableCell(withIdentifier: "HoursTableViewCell", for: indexPath) as! HoursTableViewCell
        //let hoursEntry = hoursEntries[indexPath.row]
        cell.hourLabel.text = "6"
        cell.minuteLabel.text = "6"
        cell.dateLabel.text = "6"
        cell.organizationLabel.text = "6"
        //cell = makeCellFromHoursEntry(currentCell: cell, hoursEntry: hoursEntry)
        return cell
    }

答案 15 :(得分:0)

以下逻辑将与

无关
  1. 数据结构
  2. 按OR设置分组或任何类型的压缩逻辑
  3. 大写字母或非大写字母
  4. 如果不是连续字符,请重复输入

    def fstrComp_1(stng):
    sRes = ""
    cont = 1        
    for i in range(len(stng)):
    
     if not stng[i] in sRes:
        stng = stng.lower()
        n = stng.count(stng[i])
        if  n > 1: 
            cont = n
            sRes += stng[i] + str(cont)
        else:
            sRes += stng[i]
    
        print(sRes)
    
    fstrComp_1("aB*b?cC&")
    

答案 16 :(得分:0)

这是压缩功能的简短python实现:

#d=compress('xxcccdex')
#print(d)

def compress(word):
    list1=[]
    for i in range(len(word)):
        list1.append(word[i].lower())
    num=0
    dict1={}
    for i in range(len(list1)):
        if(list1[i] in list(dict1.keys())):
            dict1[list1[i]]=dict1[list1[i]]+1
        else:
            dict1[list1[i]]=1

    s=list(dict1.keys())
    v=list(dict1.values())
    word=''
    for i in range(len(s)):
        word=word+s[i]+str(v[i])
    return word

答案 17 :(得分:0)

您可以通过以下方式简单地实现该目标:

gstr="aaabbccccdddee"
last=gstr[0]
count=0
rstr=""
for i in gstr:
    if i==last:
        count=count+1
    elif i!=last:
        rstr=rstr+last+str(count)
        count=1
        last=i
rstr=rstr+last+str(count)
print ("Required string for given string {} after conversion is {}.".format(gstr,rstr))

答案 18 :(得分:0)

这是解决问题的方法。但请记住,这种方法仅在存在重复次数时(尤其是连续字符重复时)有效。否则,只会 更差 这种情况。

例如,
      AABCD-> A2B1C1D1
      BcDG ---> B1c1D1G1

def compress_string(s):
    result = [""] * len(s)
    visited = None

    index = 0
    count = 1

    for c in s:
        if c == visited:
            count += 1
            result[index] = f"{c}{count}"
        else:
            count = 1
            index += 1
            result[index] = f"{c}{count}"
            visited = c

    return "".join(result)

答案 19 :(得分:0)

这是我写的东西。

def stringCompression(str1):
  counter=0
  prevChar = str1[0]
  str2=""
  charChanged = False
  loopCounter = 0

  for char in str1:
      if(char==prevChar):
          counter+=1
          charChanged = False
      else:
          str2 += prevChar + str(counter)
          counter=1
          prevChar = char
          if(loopCounter == len(str1) - 1):
              str2 += prevChar + str(counter)
          charChanged = True
      loopCounter+=1
  if(not charChanged):
      str2+= prevChar + str(counter)

  return str2

我猜不是最好的代码。但是效果很好。

a-> a1

aaabbbccc-> a3b3c3

答案 20 :(得分:0)

s=input("Enter the string:")
temp={}
result=" "
for x in s:
    if x in temp:
        temp[x]=temp[x]+1
    else:
        temp[x]=1
for key,value in temp.items():
    result+=str(key)+str(value)

打印(结果)

答案 21 :(得分:0)

input = "mississippi"
count = 1
for i in range(1, len(input) + 1):
    if i == len(input):
        print(input[i - 1] + str(count), end="")
        break
    else:
        if input[i - 1] == input[i]:
            count += 1
    else:
            print(input[i - 1] + str(count), end="")
            count = 1

输出:m1i1s2i1s2i1p2i1