Question

我有一个数据框，我想将第3列中的字符串拆分为最后一列，每列分为两列，标题保留在第一列拆分列中。这是数据框：

{"size":24,
"query":{
  "bool":{
    "filter":[{"term":{"author":{"value":"tom","boost":1.0}}}],
    "must_not":[{"term":{"status":{"value":"deleted","boost":1.0}}}],
    "should":[
      {"term":{"f1":{"value":"v1","boost":1.0}}},
      {"term":{"f2":{"value":"v2","boost":1.0}}},
      {"term":{"f3":{"value":"v3","boost":1.0}}},
      {"term":{"f4":{"value":"v4","boost":1.0}}}
      ],
      "minimum_should_match":"2",
      "boost":1.0
  }}
}

这是我想要的数据框，它从第3列拆分为两列（用制表符分隔），并使用字符串：

Sample  Pop     a1      a10     a100
F295    Pesche  AC      AT      AA
F296    Pesche  GT      CG      AC
F297    Pesche  AA      GG      TT
F298    Pesche  AC      AG      CG

问题与那些“拆分一列”不相似，请帮忙。

Answer 1

您可以在列中创建$answer -> answer = $request->input('answer.'.$value);，方法是将转换后的字符串拆分成具有concat的列表，以将值拆分成列表，以进行连接：

MultiIndex

如果需要避免df1 = df.set_index(['Sample','Pop']) comp = [pd.DataFrame(df1[x].apply(list).values.tolist(), index=df1.index) for x in df1.columns] df2 = pd.concat(comp, axis=1, keys=df1.columns) print (df2) a1 a10 a100 0 1 0 1 0 1 Sample Pop F295 Pesche A C A T A A F296 Pesche G T C G A C F297 Pesche A A G G T T F298 Pesche A C A G C G，请先使用f字符串连接列名，以避免重复的列名，然后再DataFrame.reset_index：

MultiIndex

Answer 2

您可以使用for循环

import pandas as pd

data = {
    'Sample': ['F295','F296','F297','F298'],
    'Pop': ['Pesche', 'Pesche', 'Pesche', 'Pesche'],
    'a1': ['AC', 'GT', 'AA', 'AC'],
    'a10': ['AT', 'CG', 'GG', 'AG'],
    'a100': ['AA', 'AC', 'TT', 'CG']
}

df = pd.DataFrame(data) # For reproductibiliy, you should include this kind of code in your next questions :)

for col_name in list(df.columns[2:]): # iterate on all column after the third one
    df[col_name] = df[col_name].apply(lambda x: f"{x[0]}\t{x[1]}") # split on tab

df

在数据框中拆分几列

2 个答案: