根据条件拆分字符串并附加到列

时间:2021-03-23 16:35:41

标签: python pandas string split

我正在查看来自在线调查的数据,其中包括原因代码、原因标题和一些自由文本以及仅包含自由文本的选项。这三种原因可能如下所示:

  1. 01|TOO_BIG|物品太大
  2. 物品太大
  3. 太大了

当我使用以下代码拆分字符串时,选项 2 和选项 3 的自由文本字段将附加到拆分的第一列。

df[['reason_code','reason', 'free_text']] = df['reason_description'].str.split('|', expand=True)

这是我想要达到的结果:

<头>
原因代码 原因 文字
01 太大了 物品太大

3 个答案:

答案 0 :(得分:0)

你可以这样做:

import pandas as pd

df = pd.DataFrame([["A|1|X"],["B|2|Y"]], columns=['reason_description'])
df = df.join(df['reason_description'].str.split("|", expand=True).rename(columns={0:'reason_code',1: 'reason', 2:'free_text'}))
print(df)

哪个会给你

  reason_description reason_code reason free_text
0              A|1|X           A      1         X
1              B|2|Y           B      2         Y

答案 1 :(得分:0)

你可以像这样使用 python 字典:

code = '01|TOO_BIG|Item is too big'  # Our code
# Creating a dictionary with each value corresponding to a None value
r = {
    "reason_code": None,
    "reason": None,
    "text": None,
}

# Assuming each code is split exactly 3 times...
# Splitting our code, and the 0th item will correspond to our 'reason code'
r['reason_code'] = code.split("|")[0]
# Splitting our code, and the 1sh item will correspond to our 'reason'
r['reason'] = code.split("|")[1]
# Splitting our code, and the 2nd item will correspond to our 'text'
r['text'] = code.split("|")[2]
print(r)
'''
You can research more about python dictionaries at:
https://www.w3schools.com/python/python_dictionaries.asp
'''

这是假设您的代码在每个“|”之间拆分,并且正好有 3 列大。

答案 2 :(得分:0)

我假设你有一个这样的 DataFrame

df = pd.DataFrame({'reason_description': ['01|TOO_BIG|Item is too big']})
print(df)
"""
           reason_description
0  01|TOO_BIG|Item is too big
"""

然后您的代码将返回:

df[['reason_code', 'reason', 'text']] = df['reason_description'].str.split('|', expand=True)
print(df)
"""
           reason_description reason_code   reason             text
0  01|TOO_BIG|Item is too big          01  TOO_BIG  Item is too big
"""

您可以通过这种方式删除 reason_description 列:

col_dict = {
    0: 'reason_code',
    1: 'reason', 
    2: 'free_text'
}

df = df['reason_description'].str.split('|', expand=True).rename(columns=col_dict)
print(df)
"""
  reason_code   reason        free_text
0          01  TOO_BIG  Item is too big
"""