在python中分割键值字符串并将其移到df列中

时间:2019-03-14 06:11:54

标签: python regex pandas split

这是我的专栏,我想拆分成键-值,并将其存储在pandas df中的新列中。

{"FontStyle"=>"Gill Sans Standard", "FontSize"=>"Medium (3mm)"}
{"Font Style"=>"Gill Sans Standard","Font Size"=>"Medium (3mm)"}
{"Font Style":"Script","Font Size":"Medium (3mm)"}
{"Font Style"=>"Gill Sans Standard","Font Size"=>"Medium (3mm)"}
{"Font Style":"Gill Sans Standard","Font Size":"Medium (3mm)"}

主要问题是其中一些具有'=>'而另一些具有冒号

我要在df中添加两个新列,一列用于“字体样式”,另一列用于“字体大小”以及其中受尊重的值

如果有人可以帮助我实现这一目标,那就太好了,而且如果您可以向我推荐一些有关正则表达式的书/教程,那么也很棒。

谢谢

3 个答案:

答案 0 :(得分:1)

到目前为止,这不是最有效的代码,但这可以完成工作。

import pandas as pd
import ast

text = '''{"FontStyle"=>"Gill Sans Standard", "FontSize"=>"Medium (3mm)"}
{"Font Style"=>"Gill Sans Standard","Font Size"=>"Medium (3mm)"}
{"Font Style"=>"Script","Font Size"=>"Medium (3mm)"}
{"Font Style"=>"Gill Sans Standard","Font Size"=>"Medium (3mm)"}'''

my_list = []

text = text.replace("FontStyle", "Font Style")
text = text.replace("FontSize", "Font Size")
text = text.replace("=>", ":")
text = text.split("\n")

for one_dict in text:
    my_list.append(ast.literal_eval(one_dict))

df = pd.DataFrame(my_list)
print(df)

以上代码的输出:

      Font Size          Font Style
0  Medium (3mm)  Gill Sans Standard
1  Medium (3mm)  Gill Sans Standard
2  Medium (3mm)              Script
3  Medium (3mm)  Gill Sans Standard

我希望这会有所帮助。 :-)让我知道是否可以。

答案 1 :(得分:1)

尝试一下:

import ast
df['col'] = df['col'].str.replace('=>', ': ').str.replace('FontSize', 'Font Size').str.replace('FontStyle', 'Font Style')
df['col']= df["col"].apply(lambda x : dict(ast.literal_eval(x)))
df1 = df['col'].apply(pd.Series)

答案 2 :(得分:0)

我认为这里var io = require('socket.io')(server); io.set('transports', ['websocket']); const redisAdapter = require('socket.io-redis'); io.adapter(redisAdapter({ host: SERVER, port: PORT })); var data = upload.single('data'); app.post('/getData', data, function(req, res) { var clientId = req.body.targetId io.in(clientId).clients((err, clients) => { if (err) { res.sendStatus(500) console.log(err) return } if (!clients) return var object = { token: req.body.dataToken, patid: req.body.patId } var client = clients[0] console.log(client) //client will console log just fine here as the socketid io.sockets.connected[client].emit('FOO', object, response=> { res.send(JSON.stringify({"data": response})) }) }) }) 不是必需的,请使用:

regex

import ast

print (df)
                                                 col
0  {"FontStyle"=>"Gill Sans Standard", "FontSize"...
1  {"Font Style"=>"Gill Sans Standard","Font Size...
2  {"Font Style":"Script","Font Size":"Medium (3m...
3  {"Font Style"=>"Gill Sans Standard","Font Size...
4  {"Font Style":"Gill Sans Standard","Font Size"...
5                                                NaN

说明

  1. 首先通过DataFrame.dropna删除缺失的行
  2. 然后将Series.str.replace用于字典中的值
  3. 通过d = {'=>':':', 'FontSize':'Font Size','FontStyle':'Font Style'} regex = '|'.join(r"{}".format(x) for x in d.keys()) df1 = (df['col'].dropna() .str.replace(regex, lambda x: d[x.group()], regex=True) .apply(ast.literal_eval)) df2 = pd.DataFrame(df1.values.tolist())[['Font Size','Font Style']].dropna(how='all') print (df2) Font Size Font Style 0 Medium (3mm) Gill Sans Standard 1 Medium (3mm) Gill Sans Standard 2 Medium (3mm) Script 3 Medium (3mm) Gill Sans Standard 4 Medium (3mm) Gill Sans Standard 将值转换为字典
  4. 创建新的DataFrame
  5. 如有必要,仅按列表过滤列,并仅移动ast.literal_eval的行