我正在查看来自在线调查的数据,其中包括原因代码、原因标题和一些自由文本以及仅包含自由文本的选项。这三种原因可能如下所示:
当我使用以下代码拆分字符串时,选项 2 和选项 3 的自由文本字段将附加到拆分的第一列。
df[['reason_code','reason', 'free_text']] = df['reason_description'].str.split('|', expand=True)
这是我想要达到的结果:
原因代码 | 原因 | 文字 |
---|---|---|
01 | 太大了 | 物品太大 |
答案 0 :(得分:0)
你可以这样做:
import pandas as pd
df = pd.DataFrame([["A|1|X"],["B|2|Y"]], columns=['reason_description'])
df = df.join(df['reason_description'].str.split("|", expand=True).rename(columns={0:'reason_code',1: 'reason', 2:'free_text'}))
print(df)
哪个会给你
reason_description reason_code reason free_text
0 A|1|X A 1 X
1 B|2|Y B 2 Y
答案 1 :(得分:0)
你可以像这样使用 python 字典:
code = '01|TOO_BIG|Item is too big' # Our code
# Creating a dictionary with each value corresponding to a None value
r = {
"reason_code": None,
"reason": None,
"text": None,
}
# Assuming each code is split exactly 3 times...
# Splitting our code, and the 0th item will correspond to our 'reason code'
r['reason_code'] = code.split("|")[0]
# Splitting our code, and the 1sh item will correspond to our 'reason'
r['reason'] = code.split("|")[1]
# Splitting our code, and the 2nd item will correspond to our 'text'
r['text'] = code.split("|")[2]
print(r)
'''
You can research more about python dictionaries at:
https://www.w3schools.com/python/python_dictionaries.asp
'''
这是假设您的代码在每个“|”之间拆分,并且正好有 3 列大。
答案 2 :(得分:0)
我假设你有一个这样的 DataFrame:
df = pd.DataFrame({'reason_description': ['01|TOO_BIG|Item is too big']})
print(df)
"""
reason_description
0 01|TOO_BIG|Item is too big
"""
然后您的代码将返回:
df[['reason_code', 'reason', 'text']] = df['reason_description'].str.split('|', expand=True)
print(df)
"""
reason_description reason_code reason text
0 01|TOO_BIG|Item is too big 01 TOO_BIG Item is too big
"""
您可以通过这种方式删除 reason_description
列:
col_dict = {
0: 'reason_code',
1: 'reason',
2: 'free_text'
}
df = df['reason_description'].str.split('|', expand=True).rename(columns=col_dict)
print(df)
"""
reason_code reason free_text
0 01 TOO_BIG Item is too big
"""