将多个.txt文件合并到csv

时间:2016-09-05 06:51:50

标签: python-2.7 csv pandas dataframe

* Python新手。

我正在尝试将多个文本文件合并为1个csv;以下示例 -

filename.csv

Alpha

0
0.1
0.15
0.2
0.25
0.3

text1.txt

Alpha,Beta
0,10
0.2,20
0.3,30

text2.txt

Alpha,Charlie
0.1,5
0.15,15

text3.txt

Alpha,Delta
0.1,10
0.15,20
0.2,50
0.3,10

csv文件中的所需输出: -

filename.csv

Alpha  Beta  Charlie  Delta
  0     10     0        0
  0.1    0     5        10
  0.15   0     15       20
  0.2   20     0        50
  0.25   0     0        0
  0.3   30     0        10

我一直在使用的代码和其他提供的代码给出了类似于页面底部的答案

def mergeData(indir="Dir Path", outdir="Dir Path"):
    dfs = []
    os.chdir(indir)
    fileList=glob.glob("*.txt")
    for filename in fileList:
        left= "/Path/Final.csv"
        right = filename
        output = "/Path/finalMerged.csv"
        leftDf = pandas.read_csv(left)
        rightDf = pandas.read_csv(right)
        mergedDf = pandas.merge(leftDf,rightDf,how='inner',on="Alpha", sort=True)
        dfs.append(mergedDf)
    outputDf = pandas.concat(dfs, ignore_index=True)
    outputDf = pandas.merge(leftDf, outputDf, how='inner', on='Alpha', sort=True, copy=False).fillna(0)
    print (outputDf)

    outputDf.to_csv(output, index=0)

mergeData()

然而,我得到的答案不是预期的结果: -

Alpha  Beta  Charlie  Delta
  0     10     0        0
  0.1    0     5        0
  0.1    0     0        10
  0.15   0     15       0
  0.15   0     0        20
  0.2   20     0        0
  0.2    0     0        50
  0.25   0     0        0
  0.3   30     0        0
  0.3    0     0        10

2 个答案:

答案 0 :(得分:0)

IIUC您可以创建所有Class Dog { private $privateProperty = "private"; //I can only be access from inside the Dog class protected $protectedProperty = "protected"; //I can be accessed from inside the dog class and all child classes public $publicProperty = "public"; //I can be accessed from everywhere. } Class Poodle extends Dog { public function getProtectedProperty(){ return $this->protectedProperty; //This is ok because it's inside the Poodle (child class); } } $poodle = new Poodle; echo $poodle->publicProperty; //This is ok because it's public echo $poodle->getProtectedProperty(); //This is ok because it calls a public method. - DataFrames的列表,在循环中将dfs和最后concat全部mergedDf添加到一个:

DataFrames

编辑:

问题是您将第二个import pandas import glob import os def mergeData(indir="dir/path", outdir="dir/path"): dfs = [] os.chdir(indir) fileList=glob.glob("*.txt") for filename in fileList: left= "/path/filename.csv" right = filename output = "/path/filename.csv" leftDf = pandas.read_csv(left) rightDf = pandas.read_csv(right) mergedDf = pandas.merge(leftDf,rightDf,how='right',on="Alpha", sort=True) dfs.append(mergedDf) outputDf = pandas.concat(dfs, ignore_index=True) #add missing rows from leftDf (in sample Alpha - 0.25) #fill NaN values by 0 outputDf = pandas.merge(leftDf,outputDf,how='left',on="Alpha", sort=True).fillna(0) #columns are converted to int outputDf[['Beta', 'Charlie']] = outputDf[['Beta', 'Charlie']].astype(int) print (outputDf) outputDf.to_csv(output, index=0) mergeData() Alpha Beta Charlie 0 0.00 10 0 1 0.10 0 5 2 0.15 0 15 3 0.20 20 0 4 0.25 0 0 5 0.30 30 0 中的参数how='left'更改为merge

how='inner'
def mergeData(indir="Dir Path", outdir="Dir Path"):
    dfs = []
    os.chdir(indir)
    fileList=glob.glob("*.txt")
    for filename in fileList:
        left= "/Path/Final.csv"
        right = filename
        output = "/Path/finalMerged.csv"
        leftDf = pandas.read_csv(left)
        rightDf = pandas.read_csv(right)
        mergedDf = pandas.merge(leftDf,rightDf,how='inner',on="Alpha", sort=True)
        dfs.append(mergedDf)
    outputDf = pandas.concat(dfs, ignore_index=True)
    #need left join, not inner
    outputDf = pandas.merge(leftDf, outputDf, how='left', on='Alpha', sort=True, copy=False)
                     .fillna(0)
    print (outputDf)

    outputDf.to_csv(output, index=0)

mergeData()

答案 1 :(得分:0)

import pandas as pd
data1 = pd.read_csv('samp1.csv',sep=',')
data2 = pd.read_csv('samp2.csv',sep=',')
data3 = pd.read_csv('samp3.csv',sep=',')
df1 = pd.DataFrame({'Alpha':data1.Alpha})
df2 = pd.DataFrame({'Alpha':data2.Alpha,'Beta':data2.Beta})
df3 = pd.DataFrame({'Alpha':data3.Alpha,'Charlie':data3.Charlie})
mergedDf = pd.merge(df1, df2, how='outer', on ='Alpha',sort=False)
mergedDf1 = pd.merge(mergedDf, df3, how='outer', on ='Alpha',sort=False)
a = pd.DataFrame(mergedDf1)
print(a.drop_duplicates())

output:
  Alpha  Beta  Charlie
0   0.00  10.0      NaN
1   0.10   NaN      5.0
2   0.15   NaN     15.0
3   0.20  20.0      NaN
4   0.25   NaN      NaN
5   0.30  30.0      NaN