在数据框列的列表列表中查找列表的出现

时间:2019-06-17 16:24:49

标签: python string list dataframe

我有一个数据框df,只有一列。

data = {'details': [['brand : honda', 'car : city', 'colour : black'],['brand : toyota', 'car : innova'],
                    ['brand : honda', 'colour : red'], ['brand : toyota', 'car : corolla', 'colour : white', 'type : sedan']]}
df = pd.DataFrame(data,columns= ['details'])
df

我想将数据框分为不同的列,并获得一个看起来像这样的数据框-

data = {'details': [['brand : honda', 'car : city', 'colour : black'],['brand : toyota', 'car : innova'],
                    ['brand : honda', 'colour : red'], ['brand : toyota', 'car : corolla', 'colour : white', 'type : sedan']],
        'brand': ['honda', 'toyota', 'honda', 'toyota'],
        'car': ['city','innova','','corolla'],
        'colour': ['black','','red','white'],
        'type': ['','','','sedan']
        }
df2 = pd.DataFrame(data,columns= ['details', 'brand', 'car', 'colour', 'type'])
df2

我尝试了以下操作,但没有用-

a2 = []
b2 = []
c2 = []
d2 = []
for i in df['details']:
    for j in range(len(i)):
        if 'brand :' in i[j]:
            print 'lalala'
            a1 = i[j]
            a2.append(a1)
        else:
            a1 = ''
            a2.append(a1)
        if 'car :' in i[j]:
            print 'lalala'
            b1 = i[j]
            b2.append(b1)
        else:
            b1 = ''
            b2.append(b1)
        if 'colour :' in i[j]:
            c1 = i[j]
            c2.append(c1)
        else:
            c1 = ''
            c2.append(c1)
        if 'type :' in i[j]:
            d1 = i[j]
            d2.append(d1)
        else:
            d1 = ''
            d2.append(d1)
df['brand'] = a2
df['car'] = b2
df['colour'] = c2
df['type'] = d2

在遇到重大路障时请提供帮助。

3 个答案:

答案 0 :(得分:0)

假设详细信息类型已知,您可以尝试以下操作:

details_types = ['brand', 'car', 'colour', 'type']

for x in details_types :
    df[x] = None

for idx, value in df.iterrows(): 
    for col_details in df.iloc[idx, 0]:
        feature = col_details.replace(' ', '').split(':')[0]
        value = col_details.replace(' ', '').split(':')[1]
        df.iloc[idx, list(df.columns).index(feature)] = value

输出

|   |                      details                      | brand  |   car   | colour | type  |
|---|---------------------------------------------------|--------|---------|--------|-------|
| 0 | [brand : honda, car : city, colour : black]       | honda  | city    | black  | None  |
| 1 | [brand : toyota, car : innova]                    | toyota | innova  | None   | None  |
| 2 | [brand : honda, colour : red]                     | honda  | None    | red    | None  |
| 3 | [brand : toyota, car : corolla, colour : white... | toyota | corolla | white  | sedan |

答案 1 :(得分:0)

下面是一种稍微简单一些的方法-

function writeRowColToSpreadsheet() {
  var ss=SpreadsheetApp.getActive();
  var sh=ss.getActiveSheet();
  sh.clear();
  var rg=sh.getRange(1,1,25,25);
  var vA=rg.getValues();
  for(var i=0;i<vA.length;i++) {
    for(var j=0;j<vA[i].length;j++) {
      vA[i][j]=Utilities.formatString('%s,%s', i+1,j+1);
    }
  }
  rg.setValues(vA);
}
data = {'details': [['brand : honda', 'car : city', 'colour : black'],['brand : toyota', 'car : innova'],
                    ['brand : honda', 'colour : red'], ['brand : toyota', 'car : corolla', 'colour : white', 'type : sedan']]}

#takes a string and returns a dict based on ':'
def fix(l):
    return dict(s.split(':') for s in l)

#flatten and fix the lists of lists to get a list of dicts
dicts = [fix(i) for sublist in data.values() for i in sublist]

#Add the lists into a single dataframe (optional add the 'Details' column)
df = pd.DataFrame.from_dict(dicts)
df['details'] = pd.DataFrame.from_dict(data)  #adding 'Details' col
print(df)

答案 2 :(得分:0)

import pandas as pd
from collections import ChainMap
data = {'details': [['brand : honda', 'car : city', 'colour : black'],['brand : toyota', 'car : innova'],
                ['brand : honda', 'colour : red'], ['brand : toyota', 'car : corolla', 'colour : white', 'type : sedan']]}
#STEP_1
lists=[[{y.split(':')[0]:y.split(':')[1]} for y in x] for x in data['details']]
#STEP_2
data_df = [dict(ChainMap(*x)) for x in lists]
#STEP_3
data_df=pd.DataFrame(data_df)
#STEP_4
data_df['details']=data['details']
print(data_df)
'''Explanation:
STEP_1: It creates list of lists with dictionary elements

[[{'brand ': ' honda'}, {'car ': ' city'}, {'colour ': ' black'}],
[{'brand ': ' toyota'}, {'car ': ' innova'}],
[{'brand ': ' honda'}, {'colour ': ' red'}],
[{'brand ': ' toyota'},
{'car ': ' corolla'},
{'colour ': ' white'},
{'type ': ' sedan'}]]

STEP_2: It is to convert list of lists to list of dictionaries

[{'colour ': ' black', 'car ': ' city', 'brand ': ' honda'},
{'car ': ' innova', 'brand ': ' toyota'},
{'colour ': ' red', 'brand ': ' honda'},
{'type ': ' sedan',
'colour ': ' white',
'car ': ' corolla',
'brand ': ' toyota'}]

STEP_3: As we can directly create a dataframe from list of 
dictionaries, it creates a dataframe with 4 columns that are brand, 
car, color & type

STEP_4: Add the column 'details' using the 'data' variable'''