熊猫:应用逐行函数时,将一行分为两行或更多行

时间:2020-03-15 09:51:49

标签: python pandas dataframe apply

我在熊猫中有一个数据框,如下所示:

pushSubdocument();//calls the function

async function pushSubdocument() {
  const doc = await findByIdMongoose(); //I ask to await
  console.log(doc);//I am printing here, and it is undefined
}
function findByIdMongoose() {
  Document.findById({ _id: "5e6d0f3e8afae22ee0cc238c" })
    .select("friends")
    .then(doc => {
        doc.friends.push({
        name: "Maria",
        email: "mariadomar@test.com",
        relatives: []
      });

      doc.save().then(() => {
        console.log("saved!");
      });
// if I print it here, before returning, it is okay
      return doc;
    });
}

但是我想将其转换为这样的表:

ERROR: Command errored out with exit status 1:
 command: 'c:\program files\python38\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\NDSNIVE\\AppData\\Local\\Temp\\pip-req-build-_mdk7oi0\\setup.py'"'"'; __file__='"'"'C:\\Users\\NDSNIVE\\AppData\\Local\\Temp\\pip-req-build-_mdk7oi0\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\NDSNIVE\AppData\Local\Temp\pip-req-build-_mdk7oi0\pip-egg-info'
     cwd: C:\Users\NDSNIVE\AppData\Local\Temp\pip-req-build-_mdk7oi0\
Complete output (23 lines):
# pysam: cython is available - using cythonize if necessary
# pysam: htslib mode is shared
# pysam: HTSLIB_CONFIGURE_OPTIONS=None
'.' is not recognized as an internal or external command,
operable program or batch file.
'.' is not recognized as an internal or external command,
operable program or batch file.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\NDSNIVE\AppData\Local\Temp\pip-req-build-_mdk7oi0\setup.py", line 241, in <module>
    htslib_make_options = run_make_print_config()
  File "C:\Users\NDSNIVE\AppData\Local\Temp\pip-req-build-_mdk7oi0\setup.py", line 68, in run_make_print_config
    stdout = subprocess.check_output(["make", "-s", "print-config"])
  File "c:\program files\python38\lib\subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "c:\program files\python38\lib\subprocess.py", line 489, in run
    with Popen(*popenargs, **kwargs) as process:
  File "c:\program files\python38\lib\subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "c:\program files\python38\lib\subprocess.py", line 1307, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] Den angivne fil blev ikke fundet
# pysam: htslib configure options: None
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

是否可以应用逐行函数(在熊猫中使用df = pd.DataFrame([[4, 9],[4,9],[[1,2],[3,4]]], columns=['A', 'B']) df A B 0 4 9 1 4 9 2 [1, 2] [3, 4] 或其他函数)

7 个答案:

答案 0 :(得分:3)

使用lis理解和chain的平坦值:

from  itertools import chain

out = list(chain.from_iterable(item if isinstance(item[0],list) 
             else [item] for item in df[['A','B']].values))
df1 = pd.DataFrame(out, columns=['A','B'])

或循环替代:

out = []
for x in df[['A','B']].values:
    if isinstance(x[0], list):
        for y in x:
            out.append(y)
    else:
        out.append(x)

df1 = pd.DataFrame(out, columns=['A','B'])
print (df1)
   A  B
0  4  9
1  4  9
2  1  2
3  3  4

答案 1 :(得分:1)

concat中使用列表理解:

df = pd.DataFrame([[4, 9],[4,9],[[1,2],[3,4]],], columns=['A', 'B'])

print (pd.concat([df.loc[:1], *[pd.DataFrame(list(i),columns=df.columns) for i in df.loc[2:].to_numpy()]],
                 ignore_index=True))
   A  B
0  4  9
1  4  9
2  1  2
3  3  4

答案 2 :(得分:1)

您可以这样做:

#main piece - the rest is actually 'fixing' the multiindex piece to fit your purpose:
df=df.stack().explode().to_frame()

df["id"]=df.groupby(level=[0,1]).cumcount()

df.index=pd.MultiIndex.from_tuples(zip(df.index.get_level_values(0)+df['id'], df.index.get_level_values(1)))

df=df.drop(columns="id").unstack()

df.columns=map(lambda x: x[1], df.columns)

输出:

>>> df

   A  B
0  4  9
1  4  9
2  1  3
3  2  4

答案 3 :(得分:0)

使用DataFrame.applySeries.explodeDataFrame.maskDataFrame.where

types = df.applymap(type).eq(list)
arr = df.where(types).apply(pd.Series.explode).dropna().T.to_numpy()
df.mask(types).dropna().append(pd.DataFrame(arr, columns=df.columns), ignore_index=True)

   A  B
0  4  9
1  4  9
2  1  2
3  3  4

答案 4 :(得分:0)

使用简单的for和if循环:

 alist = df['A'].tolist()
 blist = df['B'].tolist()

 alist1=[]
 blist1=[]
 for k,r in zip(alist,blist):
   if isinstance(k,list):
     alist1.append(k[0])
     blist1.append(k[1])
   if isinstance(r,list):
     alist1.append(r[0])
     blist1.append(r[1])
   else:
     alist1.append(k)
     blist1.append(r)

df = pd.DataFrame({'A': alist1, 'b': blist1})

答案 5 :(得分:0)

迄今为止使用DataFrame.meltDataFrame.explodeDataFrame.pivot提出的所有其他解决方案:

import pandas as pd

df = pd.DataFrame([[4, 9],[4,9],[[1,2],[3,4]]], columns=['A', 'B'])
# Create index column
df.reset_index(inplace=True)

tmp = df.melt(id_vars='index', var_name='columns').explode('value')

# Define indexes
idx = sum([list(range(len(tmp)//tmp['columns'].nunique())) for _ in range(tmp['columns'].nunique())], [])
tmp['index'] = idx

result_df = tmp.pivot(index='index', columns='columns', values='value')

result_df
columns  A  B
index        
0        4  9
1        4  9
2        1  3
3        2  4

答案 6 :(得分:0)

该问题中有一个问题,不能确定同一行中的列表项始终具有相同的长度。 如果满足该假设,则可以使用以下答案:

df.apply(pd.Series.explode) 


    A   B
0   4   9
1   4   9
2   1   3
2   2   4