遍历字符串并使用pandas将值附加到列表

时间:2020-01-14 22:05:58

标签: python pandas e-commerce

嗨,我正在尝试从熊猫数据框中提取尺寸并将其追加到列表中。

Variations
Size| Medium; Large; Xlarge; 2Xlarge; 3Xlarge; 4Xlarge; 5xXlarge; 
Size| Medium; Large; Xlarge; 2Xlarge; 3Xlarge; 4Xlarge; 5xlarge; 
Sizes| Small - ( only one mic tab); Medium; Large; Xlarge; 2Xlarge; 3Xlarge; 4Xlarge; 5Xlarge; 
Sizes| Small - ( only one mic tab); Medium; Large; Xlarge; 2Xlarge; 3Xlarge; 4Xlarge; 5Xlarge; 
SIZE - COLOR| L/XL - Lime; 2XL/3XL - Lime; 

这是我到目前为止尝试过的。

def size_extractor(data):

    size_list = []

    for char in data:
        if char == "|":

            if char == " ":
                continue

                size_list.append(char)

            elif char == ";":
                continue

    print(size_list)

    df['Variations'].apply(size_extractor)

我正在尝试使用"|"启动提取操作,并抓取" "";"之间的字符。

最后得到一个像这样的列表[Medium, Large, Xlarge, 2Xlarge, 3Xlarge, 4Xlarge, 5xXlarge]

我应该在while循环中重做吗?

2 个答案:

答案 0 :(得分:0)

import pandas as pd

d = {'Variations': ['Size| Medium; Large; Xlarge; 2Xlarge; 3Xlarge; 4Xlarge; 5xXlarge; ',
                    'Size| Medium; Large; Xlarge; 2Xlarge; 3Xlarge; 4Xlarge; 5xlarge; ',
                    'Sizes| Small - ( only one mic tab); Medium; Large; Xlarge; 2Xlarge; 3Xlarge; 4Xlarge; 5Xlarge;',
                    'Sizes| Small - ( only one mic tab); Medium; Large; Xlarge; 2Xlarge; 3Xlarge; 4Xlarge; 5Xlarge; ',
                    'SIZE - COLOR| L/XL - Lime; 2XL/3XL - Lime;']}

df = pd.DataFrame(data=d)


def size_extractor(data):
    size_list = list(map(lambda x: x.strip(), data.split('|')[1].split(';')))
    print(size_list)


df['Variations'].apply(size_extractor)

Output

代码说明

data.split('|')[1]:在“ |”处分割数据我们将在后面的部分中使用

split(';'):将数据拆分为“;”

lambda x: x.strip()map():删除字符串前后的空格

list():用于访问map()

的生成器输出

答案 1 :(得分:0)

def size_extracter(data):
    print(data)
    size_list = []
    size = ""

    for char in data:
        if char == "|":
            size_list.append(size)
            continue
        elif char == " ":
            size = ""
            continue
        else:
            size = size + char

    print(size_list)





df['Variations'] = df['Variations'].str.replace(r'^[^|]*\|\s*', '').str.replace(';', '|', regex=False)

df['Variations'].apply(size_extracter)