Question

不确定如何解决此问题。

我有一个包含数千列的数据框，在许多情况下，十列只是分成多列的字典列表。我将所有列连接在一起，并想找到某些单词，如果找到这些预定义的单词，我想将这些单词附加为列，并将分配给给定单词的所有“值”附加为列值。

采样数据（为了方便起见从数据帧转换为字典

0    [{"date":"0 1 0" firstBoxerRating:[null null] ...
1    [{"date":"2 2 1" firstBoxerRating:[null null] ...
2    [{"date":"2013-10-05" firstBoxerRating:[null n...
dtype: object

类似以下内容：

col_names= ['date','firstBoxerRating:','judges']
#for i in col_names, add i as column, add text before i & i+1 as column value

使用此示例的示例输出为：

date         firstBoxerRating
0 1 0         [null null]
2 2 1         [null null]
2013-10-05    [null n...

试图将数据框转换为字典，并使用正则表达式将值分配给列表中的项（作为键）

import re
boxers = {"date":[], "firstBoxerRating":[], "judges":[]} 
for i in ax_two:
    date_field = re.search("date: *",i)
    if date_field is not None:
        date = re.search('\w*\s\w*',date_field.group())

但这是输出

{'date': [], 'firstBoxerRating': [], 'judges': []}

根据字符串匹配将数据帧分为多列

0 个答案: