Question

我是Python的新手，需要一些我有的字符串的帮助：

string='Starters\nSalad with Greens 14.00\nSalad Goat Cheese 12.75\nMains\nPizza 12.75\nPasta 12.75\n'

并且需要将其转换为看起来更像这样的表：

Category   Dish   Price
Starters   Salad with Greens   14.00
Starters   Salad Goat Cheese   12.75
Mains   Pizza   12.75
Mains  Pasta  12.75

实现这一目标的最佳方式是什么？

我试图应用string.rsplit（“”，2），但无法弄清楚是否每行都这样做。并且不知道如何将标题重复到单独的列中。任何帮助将不胜感激。

提前致谢！

Answer 1

我想你必须决定如何区分类别和项目。我认为一件物品应该有它的价格。此代码检查是否存在点，但您可能应该使用regexp。

s = 'Starters\nSalad with Greens 14.00\nSalad Goat Cheese 12.75\nMains\nPizza 12.75\nPasta 12.75'
items = s.split('\n')
# ['Starters', 'Salad with Greens 14.00', 'Salad Goat Cheese 12.75', 'Mains', 'Pizza 12.75', 'Pasta 12.75']

category = ''
menu = {}
for item in items:
    print(item)
    if '.' in item:
        menu[category].append(item)
    else:
        category = item
        menu[category] = []
print(menu)

# {'Starters': ['Salad with Greens 14.00', 'Salad Goat Cheese 12.75'], 'Mains': ['Pizza 12.75', 'Pasta 12.75']}

UPD：您可以替换

if '.' in item:

与

if re.match(r".*\d.\d\d", item):

它正在搜索以1.11结尾的字符串（如果您在类别名称中有缩写，则非常有用）

Answer 2

不是说我会在生产环境中使用它，而是为了学术挑战：

import re

string = """Starters
Salad with Greens 14.00
Salad Goat Cheese 12.75
Mains
Pizza 12.75
Pasta 12.75"""

rx = re.compile(r'^(Starters|Mains)', re.MULTILINE)

result = "\n".join(["{}\t{}".format(category, line)
                for parts in [[part.strip() for part in rx.split(string) if part]]
                for category, dish in zip(parts[0::2], parts[1::2])
                for line in dish.split("\n")])
print(result)

这会产生

Starters    Salad with Greens 14.00
Starters    Salad Goat Cheese 12.75
Mains   Pizza 12.75
Mains   Pasta 12.75

Answer 3

试试这个。注意：它假设＆＃39; Starters＆＃39;在主要＆＃39;

之前列出

category = 'Starters'
for item in string.split('\n'):
    if item == 'Mains': category = 'Mains'
    if item in ('Starters', 'Mains'): continue

    price = item.split(' ')[-1]
    dish = ' '.join(item.split(' ')[:-1])
    print ('{} {} {}'.format(category, dish, price))

Answer 4

您可以在Python3中使用基于类的解决方案，并使用运算符重载来获得对数据的额外可访问性：

import re
import itertools
class MealPlan:
    def __init__(self, string, headers):
       self.headers = headers
       self.grouped_data = [d for c, d in [(a, list(b)) for a, b in itertools.groupby(string.split('\n'), key=lambda x:x in ['Starters', 'Mains'])]]
       self.final_grouped_data = list(map(lambda x:[x[0][0], x[-1]], [grouped_data[i:i+2] for i in range(0, len(grouped_data), 2)]))
       self.final_data = [[[a, *list(filter(None, re.split('\s(?=\d)', i)))] for i in b] for a, b in final_grouped_data]
       self.final_data = [list(filter(lambda x:len(x) > 1, i)) for i in self.final_data]
    def __getattr__(self, column):
        if column not in self.headers:
            raise KeyError("'{}' not found".format(column))
        transposed = [dict(zip(self.headers, i)) for i in itertools.chain.from_iterable(self.final_data)]
        yield from map(lambda x:x[column], transposed)
    def __getitem__(self, row):
         new_grouped_data = {a:dict(zip(self.headers[1:], zip(*[i[1:] for i in list(b)]))) for a, b in itertools.groupby(list(itertools.chain(*self.final_data)), key=lambda x:x[0])}
         return new_grouped_data[row]
    def __repr__(self):
         return ' '.join(self.headers)+'\n'+'\n'.join('\n'.join(' '.join(c) for c in i) for i in self.final_data)

string='Starters\nSalad with Greens 14.00\nSalad Goat Cheese 12.75\nMains\nPizza 12.75\nPasta 12.75\n' 
meal = MealPlan(string, ['Category', 'Dish', 'Price'])
print(meal)
print([i for i in meal.Category])
print(meal['Starters'])

输出：

Category Dish Price
Starters Salad with Greens 14.00
Starters Salad Goat Cheese 12.75
Mains Pizza 12.75
Mains Pasta 12.75
['Starters', 'Starters', 'Mains', 'Mains']
{'Dish': ('Salad with Greens', 'Salad Goat Cheese'), 'Price': ('14.00', '12.75')}

Python - 使用rsplit将字符串转换为表格

4 个答案: