我有一个数据框,我想对它进行排序,以便列a == columnb。如果没有匹配则将其放入C列
我的数据看起来像
filenamesLocal FilenamesServer
filea.csv fileab.csv
filec.csv filea.csv
fileab.csv filec.csv
filexyz.csv
fileyh.csv
我想要将它们排序到filenamesLocal = FilenamesServer,其余列在另一列中。
filenamesLocal FilenamesServer Difference
filea.csv filea.csv filexyz.csv
filec.csv filec.csv fileyh.csv
fileab.csv fileab.csv
我的代码到目前为止..
ldsdata = pd.read_csv('filelist.csv', sep=" ", header = None)
#data.to_csv("filelist.csv", index=False)
dataproj = pd.read_csv('edslist.txt', sep=" ", header = None)
dataproj.columns = ["fileNameEdsComputer"]
result = pd.concat([ldsdata, dataproj], axis=1, ignore_index=True)
result.columns = ['fileNameLDS', path]
result.sort(['fileNameLDS',path], ascending=[True, False], inplace=True)
result.to_csv('list.csv', index=False)
checkDifferences()
答案 0 :(得分:1)
import pandas as pd
from StringIO import StringIO
text="""filenamesLocal FilenamesServer
filea.csv fileab.csv
filec.csv filea.csv
fileab.csv filec.csv
filexyz.csv
fileyh.csv"""
df = pd.read_csv(StringIO(text), delim_whitespace=True)
fnl = df.iloc[:, [0]].set_index(['filenamesLocal'], drop=False).dropna()
fns = df.iloc[:, [1]].set_index(['FilenamesServer'], drop=False).dropna()
print fnl
filenamesLocal
filenamesLocal
filea.csv filea.csv
filec.csv filec.csv
fileab.csv fileab.csv
filexyz.csv filexyz.csv
fileyh.csv fileyh.csv
print fns
FilenamesServer
FilenamesServer
fileab.csv fileab.csv
filea.csv filea.csv
filec.csv filec.csv
对齐fnl
和fns
aligned = pd.concat([fnl, fns], axis=1)
print aligned
filenamesLocal FilenamesServer
filea.csv filea.csv filea.csv
fileab.csv fileab.csv fileab.csv
filec.csv filec.csv filec.csv
filexyz.csv filexyz.csv NaN
fileyh.csv fileyh.csv NaN
master = aligned.filenamesLocal.combine_first(aligned.FilenamesServer)
print master
filea.csv filea.csv
fileab.csv fileab.csv
filec.csv filec.csv
filexyz.csv filexyz.csv
fileyh.csv fileyh.csv
Name: filenamesLocal, dtype: object
分配差异
aligned['Difference'] = master[aligned.isnull().any(axis=1)]
print aligned
filenamesLocal FilenamesServer Difference
filea.csv filea.csv filea.csv filea.csv
fileab.csv fileab.csv fileab.csv fileab.csv
filec.csv filec.csv filec.csv filec.csv
filexyz.csv filexyz.csv NaN filexyz.csv
fileyh.csv fileyh.csv NaN fileyh.csv