Python - 使用rsplit将字符串转换为表格

时间:2018-01-14 20:28:28

标签: python regex

我是Python的新手,需要一些我有的字符串的帮助:

string='Starters\nSalad with Greens 14.00\nSalad Goat Cheese 12.75\nMains\nPizza 12.75\nPasta 12.75\n'

并且需要将其转换为看起来更像这样的表:

Category   Dish   Price
Starters   Salad with Greens   14.00
Starters   Salad Goat Cheese   12.75
Mains   Pizza   12.75
Mains  Pasta  12.75

实现这一目标的最佳方式是什么?

我试图应用string.rsplit(“”,2),但无法弄清楚是否每行都这样做。并且不知道如何将标题重复到单独的列中。 任何帮助将不胜感激。

提前致谢!

4 个答案:

答案 0 :(得分:2)

我想你必须决定如何区分类别和项目。我认为一件物品应该有它的价格。此代码检查是否存在点,但您可能应该使用regexp。

s = 'Starters\nSalad with Greens 14.00\nSalad Goat Cheese 12.75\nMains\nPizza 12.75\nPasta 12.75'
items = s.split('\n')
# ['Starters', 'Salad with Greens 14.00', 'Salad Goat Cheese 12.75', 'Mains', 'Pizza 12.75', 'Pasta 12.75']

category = ''
menu = {}
for item in items:
    print(item)
    if '.' in item:
        menu[category].append(item)
    else:
        category = item
        menu[category] = []
print(menu)

# {'Starters': ['Salad with Greens 14.00', 'Salad Goat Cheese 12.75'], 'Mains': ['Pizza 12.75', 'Pasta 12.75']}

UPD:您可以替换

if '.' in item:

if re.match(r".*\d.\d\d", item):

它正在搜索以1.11结尾的字符串(如果您在类别名称中有缩写,则非常有用)

答案 1 :(得分:1)

不是说我会在生产环境中使用它,而是为了学术挑战:

import re

string = """Starters
Salad with Greens 14.00
Salad Goat Cheese 12.75
Mains
Pizza 12.75
Pasta 12.75"""

rx = re.compile(r'^(Starters|Mains)', re.MULTILINE)

result = "\n".join(["{}\t{}".format(category, line)
                for parts in [[part.strip() for part in rx.split(string) if part]]
                for category, dish in zip(parts[0::2], parts[1::2])
                for line in dish.split("\n")])
print(result)

这会产生

Starters    Salad with Greens 14.00
Starters    Salad Goat Cheese 12.75
Mains   Pizza 12.75
Mains   Pasta 12.75

答案 2 :(得分:0)

试试这个。注意:它假设' Starters'在主要'

之前列出
category = 'Starters'
for item in string.split('\n'):
    if item == 'Mains': category = 'Mains'
    if item in ('Starters', 'Mains'): continue

    price = item.split(' ')[-1]
    dish = ' '.join(item.split(' ')[:-1])
    print ('{} {} {}'.format(category, dish, price))

答案 3 :(得分:0)

您可以在Python3中使用基于类的解决方案,并使用运算符重载来获得对数据的额外可访问性:

import re
import itertools
class MealPlan:
    def __init__(self, string, headers):
       self.headers = headers
       self.grouped_data = [d for c, d in [(a, list(b)) for a, b in itertools.groupby(string.split('\n'), key=lambda x:x in ['Starters', 'Mains'])]]
       self.final_grouped_data = list(map(lambda x:[x[0][0], x[-1]], [grouped_data[i:i+2] for i in range(0, len(grouped_data), 2)]))
       self.final_data = [[[a, *list(filter(None, re.split('\s(?=\d)', i)))] for i in b] for a, b in final_grouped_data]
       self.final_data = [list(filter(lambda x:len(x) > 1, i)) for i in self.final_data]
    def __getattr__(self, column):
        if column not in self.headers:
            raise KeyError("'{}' not found".format(column))
        transposed = [dict(zip(self.headers, i)) for i in itertools.chain.from_iterable(self.final_data)]
        yield from map(lambda x:x[column], transposed)
    def __getitem__(self, row):
         new_grouped_data = {a:dict(zip(self.headers[1:], zip(*[i[1:] for i in list(b)]))) for a, b in itertools.groupby(list(itertools.chain(*self.final_data)), key=lambda x:x[0])}
         return new_grouped_data[row]
    def __repr__(self):
         return ' '.join(self.headers)+'\n'+'\n'.join('\n'.join(' '.join(c) for c in i) for i in self.final_data)

string='Starters\nSalad with Greens 14.00\nSalad Goat Cheese 12.75\nMains\nPizza 12.75\nPasta 12.75\n' 
meal = MealPlan(string, ['Category', 'Dish', 'Price'])
print(meal)
print([i for i in meal.Category])
print(meal['Starters'])

输出:

Category Dish Price
Starters Salad with Greens 14.00
Starters Salad Goat Cheese 12.75
Mains Pizza 12.75
Mains Pasta 12.75
['Starters', 'Starters', 'Mains', 'Mains']
{'Dish': ('Salad with Greens', 'Salad Goat Cheese'), 'Price': ('14.00', '12.75')}