Question

我正在查看来自在线调查的数据，其中包括原因代码、原因标题和一些自由文本以及仅包含自由文本的选项。这三种原因可能如下所示：

01|TOO_BIG|物品太大
物品太大
太大了

当我使用以下代码拆分字符串时，选项 2 和选项 3 的自由文本字段将附加到拆分的第一列。

df[['reason_code','reason', 'free_text']] = df['reason_description'].str.split('|', expand=True)

这是我想要达到的结果：

<头>

原因代码	原因	文字
01	太大了	物品太大

Answer 1

你可以这样做：

import pandas as pd

df = pd.DataFrame([["A|1|X"],["B|2|Y"]], columns=['reason_description'])
df = df.join(df['reason_description'].str.split("|", expand=True).rename(columns={0:'reason_code',1: 'reason', 2:'free_text'}))
print(df)

哪个会给你

  reason_description reason_code reason free_text
0              A|1|X           A      1         X
1              B|2|Y           B      2         Y

Answer 2

你可以像这样使用 python 字典：

code = '01|TOO_BIG|Item is too big'  # Our code
# Creating a dictionary with each value corresponding to a None value
r = {
    "reason_code": None,
    "reason": None,
    "text": None,
}

# Assuming each code is split exactly 3 times...
# Splitting our code, and the 0th item will correspond to our 'reason code'
r['reason_code'] = code.split("|")[0]
# Splitting our code, and the 1sh item will correspond to our 'reason'
r['reason'] = code.split("|")[1]
# Splitting our code, and the 2nd item will correspond to our 'text'
r['text'] = code.split("|")[2]
print(r)
'''
You can research more about python dictionaries at:
https://www.w3schools.com/python/python_dictionaries.asp
'''

这是假设您的代码在每个“|”之间拆分，并且正好有 3 列大。

Answer 3

我假设你有一个这样的 DataFrame：

df = pd.DataFrame({'reason_description': ['01|TOO_BIG|Item is too big']})
print(df)
"""
           reason_description
0  01|TOO_BIG|Item is too big
"""

然后您的代码将返回：

df[['reason_code', 'reason', 'text']] = df['reason_description'].str.split('|', expand=True)
print(df)
"""
           reason_description reason_code   reason             text
0  01|TOO_BIG|Item is too big          01  TOO_BIG  Item is too big
"""

您可以通过这种方式删除 reason_description 列：

col_dict = {
    0: 'reason_code',
    1: 'reason', 
    2: 'free_text'
}

df = df['reason_description'].str.split('|', expand=True).rename(columns=col_dict)
print(df)
"""
  reason_code   reason        free_text
0          01  TOO_BIG  Item is too big
"""

根据条件拆分字符串并附加到列

3 个答案: