这是我的专栏,我想拆分成键-值,并将其存储在pandas df中的新列中。
{"FontStyle"=>"Gill Sans Standard", "FontSize"=>"Medium (3mm)"}
{"Font Style"=>"Gill Sans Standard","Font Size"=>"Medium (3mm)"}
{"Font Style":"Script","Font Size":"Medium (3mm)"}
{"Font Style"=>"Gill Sans Standard","Font Size"=>"Medium (3mm)"}
{"Font Style":"Gill Sans Standard","Font Size":"Medium (3mm)"}
主要问题是其中一些具有'=>'而另一些具有冒号
我要在df中添加两个新列,一列用于“字体样式”,另一列用于“字体大小”以及其中受尊重的值
如果有人可以帮助我实现这一目标,那就太好了,而且如果您可以向我推荐一些有关正则表达式的书/教程,那么也很棒。
谢谢
答案 0 :(得分:1)
到目前为止,这不是最有效的代码,但这可以完成工作。
import pandas as pd
import ast
text = '''{"FontStyle"=>"Gill Sans Standard", "FontSize"=>"Medium (3mm)"}
{"Font Style"=>"Gill Sans Standard","Font Size"=>"Medium (3mm)"}
{"Font Style"=>"Script","Font Size"=>"Medium (3mm)"}
{"Font Style"=>"Gill Sans Standard","Font Size"=>"Medium (3mm)"}'''
my_list = []
text = text.replace("FontStyle", "Font Style")
text = text.replace("FontSize", "Font Size")
text = text.replace("=>", ":")
text = text.split("\n")
for one_dict in text:
my_list.append(ast.literal_eval(one_dict))
df = pd.DataFrame(my_list)
print(df)
以上代码的输出:
Font Size Font Style
0 Medium (3mm) Gill Sans Standard
1 Medium (3mm) Gill Sans Standard
2 Medium (3mm) Script
3 Medium (3mm) Gill Sans Standard
我希望这会有所帮助。 :-)让我知道是否可以。
答案 1 :(得分:1)
尝试一下:
import ast
df['col'] = df['col'].str.replace('=>', ': ').str.replace('FontSize', 'Font Size').str.replace('FontStyle', 'Font Style')
df['col']= df["col"].apply(lambda x : dict(ast.literal_eval(x)))
df1 = df['col'].apply(pd.Series)
答案 2 :(得分:0)
我认为这里var io = require('socket.io')(server);
io.set('transports', ['websocket']);
const redisAdapter = require('socket.io-redis');
io.adapter(redisAdapter({ host: SERVER, port: PORT }));
var data = upload.single('data');
app.post('/getData', data, function(req, res) {
var clientId = req.body.targetId
io.in(clientId).clients((err, clients) => {
if (err) {
res.sendStatus(500)
console.log(err)
return
}
if (!clients) return
var object = {
token: req.body.dataToken,
patid: req.body.patId
}
var client = clients[0]
console.log(client) //client will console log just fine here as the socketid
io.sockets.connected[client].emit('FOO', object, response=> {
res.send(JSON.stringify({"data": response}))
})
})
})
不是必需的,请使用:
regex
import ast
print (df)
col
0 {"FontStyle"=>"Gill Sans Standard", "FontSize"...
1 {"Font Style"=>"Gill Sans Standard","Font Size...
2 {"Font Style":"Script","Font Size":"Medium (3m...
3 {"Font Style"=>"Gill Sans Standard","Font Size...
4 {"Font Style":"Gill Sans Standard","Font Size"...
5 NaN
说明:
DataFrame.dropna
删除缺失的行Series.str.replace
用于字典中的值d = {'=>':':', 'FontSize':'Font Size','FontStyle':'Font Style'}
regex = '|'.join(r"{}".format(x) for x in d.keys())
df1 = (df['col'].dropna()
.str.replace(regex, lambda x: d[x.group()], regex=True)
.apply(ast.literal_eval))
df2 = pd.DataFrame(df1.values.tolist())[['Font Size','Font Style']].dropna(how='all')
print (df2)
Font Size Font Style
0 Medium (3mm) Gill Sans Standard
1 Medium (3mm) Gill Sans Standard
2 Medium (3mm) Script
3 Medium (3mm) Gill Sans Standard
4 Medium (3mm) Gill Sans Standard
将值转换为字典ast.literal_eval
的行