我有一个pandas系列,这样系列的每一行都包含一个字符串,其格式如下(键 - 值结构):
“客户名称 - Eric \ nFamily名称 - Lammela \ n 衬衫颜色 - 白色\ n \ n” 字符串中的字段可能会更改: “客户名称 - Leo \ nFamily名称 - Messi \ n 裤子颜色 - 黑色\ n”
我想将整个系列转换为DataFrame。 什么是最有效的方式?
答案 0 :(得分:0)
你可以尝试这样的事情。我用你提供的例子来试试。
import re
import pandas as pd
# Stored your example in the string
s = pd.Series(["Customer Name - Eric\nFamily Name - Lammela\nShirt color - white\n\n","Customer Name - Leo\nFamily Name - Messi\nPants color - black\n"])
# Define a function to convert each string in the Series to a json format
def str_to_dict(txt):
txt = txt.rstrip('\n')
txt = re.sub('^', '{"', txt)
txt = re.sub(' - ', '": "', txt)
txt = re.sub('\n', '", "', txt)
txt = re.sub('$', '"}', txt)
return(txt)
# Apply the function to the Series and store the results in a new Series
s1 = s.apply(str_to_dict)
# Create an empty DataFrame
df = pd.DataFrame()
# Loop through the converted Series and append the items to the DataFrame
# after using json to convert them to a dictionary
for c in s1:
df = df.append(json.loads(c), ignore_index=True)
# Printed the df to check the results.
print(df)
Customer Name Family Name Shirt color Pants color
0 Eric Lammela white NaN
1 Leo Messi NaN black
希望这有帮助。