Pandas对列进行排序并找到差异

时间:2016-04-28 18:33:53

标签: python pandas

我有一个数据框,我想对它进行排序,以便列a == columnb。如果没有匹配则将其放入C列

我的数据看起来像

filenamesLocal          FilenamesServer
  filea.csv                  fileab.csv
  filec.csv                  filea.csv
  fileab.csv                 filec.csv
  filexyz.csv
  fileyh.csv

我想要将它们排序到filenamesLocal = FilenamesServer,其余列在另一列中。

filenamesLocal          FilenamesServer        Difference
  filea.csv                  filea.csv         filexyz.csv    
  filec.csv                  filec.csv          fileyh.csv
  fileab.csv                 fileab.csv

我的代码到目前为止..

    ldsdata = pd.read_csv('filelist.csv', sep=" ", header = None)
    #data.to_csv("filelist.csv", index=False)
    dataproj = pd.read_csv('edslist.txt', sep=" ", header = None)
    dataproj.columns = ["fileNameEdsComputer"]
    result = pd.concat([ldsdata, dataproj], axis=1, ignore_index=True)
    result.columns = ['fileNameLDS', path]
    result.sort(['fileNameLDS',path], ascending=[True, False], inplace=True)
    result.to_csv('list.csv', index=False)
    checkDifferences()

1 个答案:

答案 0 :(得分:1)

设置

import pandas as pd
from StringIO import StringIO

text="""filenamesLocal          FilenamesServer
  filea.csv                  fileab.csv
  filec.csv                  filea.csv
  fileab.csv                 filec.csv
  filexyz.csv
  fileyh.csv"""

df = pd.read_csv(StringIO(text), delim_whitespace=True)

fnl = df.iloc[:, [0]].set_index(['filenamesLocal'], drop=False).dropna()
fns = df.iloc[:, [1]].set_index(['FilenamesServer'], drop=False).dropna()

print fnl

              filenamesLocal
filenamesLocal               
filea.csv           filea.csv
filec.csv           filec.csv
fileab.csv         fileab.csv
filexyz.csv       filexyz.csv
fileyh.csv         fileyh.csv

print fns

                FilenamesServer
FilenamesServer                
fileab.csv           fileab.csv
filea.csv             filea.csv
filec.csv             filec.csv

对齐fnlfns

aligned = pd.concat([fnl, fns], axis=1)

print aligned

            filenamesLocal FilenamesServer
filea.csv        filea.csv       filea.csv
fileab.csv      fileab.csv      fileab.csv
filec.csv        filec.csv       filec.csv
filexyz.csv    filexyz.csv             NaN
fileyh.csv      fileyh.csv             NaN

master = aligned.filenamesLocal.combine_first(aligned.FilenamesServer)

print master

filea.csv        filea.csv
fileab.csv      fileab.csv
filec.csv        filec.csv
filexyz.csv    filexyz.csv
fileyh.csv      fileyh.csv
Name: filenamesLocal, dtype: object

分配差异

aligned['Difference'] = master[aligned.isnull().any(axis=1)]

print aligned

            filenamesLocal FilenamesServer   Difference
filea.csv        filea.csv       filea.csv    filea.csv
fileab.csv      fileab.csv      fileab.csv   fileab.csv
filec.csv        filec.csv       filec.csv    filec.csv
filexyz.csv    filexyz.csv             NaN  filexyz.csv
fileyh.csv      fileyh.csv             NaN   fileyh.csv