Dask-连接两个相同列的数据帧不起作用

时间:2020-09-27 17:29:09

标签: pandas dask

我有两个没有标题行的数据框,它们都有相同的逗号分隔列。 我试图用

将它们读入一个数据帧
dfoutputs = dd.read_csv(['outputsfile.csv', 'outputsfile2.csv'], names=colnames, header=None, dtype={'firstnr': 'Int64', 'secondnr': 'Int64', 'thirdnr': 'Int64', 'fourthnr': 'Int64'})

但是此数据框仅包含outputsfile.csv行。

阅读和连接的类似问题:

colnames=['firstnr', 'secondnr', 'thirdnr', 'fourthnr'] 
dfoutputs = dd.read_csv('outputsfile.csv', names=colnames, header=None, dtype={'firstnr': 'Int64', 'secondnr': 'Int64', 'thirdnr': 'Int64', 'fourthnr': 'Int64'})
print(dfoutputs.head(10))

dfoutputs2 = dd.read_csv('outputsfile2.csv', names=colnames, header=None, dtype={'firstnr': 'Int64', 'secondnr': 'Int64', 'thirdnr': 'Int64', 'fourthnr': 'Int64'})
print(dfoutputs2.head(10))

dfnew  = dd.concat([dfoutputs, dfoutputs2])
print(dfnew.head(10))

输出:

   firstnr  secondnr  thirdnr      fourthnr
0  0        0     0      5000000000
1  1        0     0      5000000000
2  2        0     0      5000000000
3  3        0     0      5000000000
4  4        0     0      5000000000
5  5        0     0      5000000000

   firstnr  secondnr  thirdnr      fourthnr
0  11       0     0      5000000000
1  12       0     0      5000000000

   firstnr  secondnr  thirdnr      fourthnr
0  0        0     0      5000000000
1  1        0     0      5000000000
2  2        0     0      5000000000
3  3        0     0      5000000000
4  4        0     0      5000000000
5  5        0     0      5000000000

如何将两个csv合并到相同的Dask数据帧?

1 个答案:

答案 0 :(得分:0)

TennisTechBoy在评论中建议:

f=open("outputsfile.csv", "a")

f2=open("outputsfile2.csv", "r")
f2content = f2.readlines()
for i in range(len(f2content)):
    f.write(f2content[i])
f.close()
f2.close()

从内存角度来看,可能需要在Dask中执行此操作。