使用pandas,如何在没有标题的情况下加载多个TXT文件?

时间:2018-02-21 02:20:01

标签: python pandas

在Python 3和pandas中,我加载了几个TXT文件。它们没有标题并且具有相同的结构 - 每列中有46列和相同的信息主题 三例的例子

candidatos1 = pd.read_csv("candidatos_2014/consulta_cand_2014_AC.txt",sep=';', header=None, encoding = 'latin_1') 

candidatos1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 621 entries, 0 to 620
Data columns (total 46 columns):
0     621 non-null object
1     621 non-null object
2     621 non-null int64
3     621 non-null int64
4     621 non-null object
5     621 non-null object
6     621 non-null object
7     621 non-null object
8     621 non-null int64
9     621 non-null object
10    621 non-null object
11    621 non-null int64
12    621 non-null int64
13    621 non-null int64
14    621 non-null object
15    621 non-null int64
16    621 non-null object
17    621 non-null int64
18    621 non-null object
19    621 non-null object
20    621 non-null int64
21    621 non-null object
22    621 non-null object
23    621 non-null object
24    621 non-null int64
25    621 non-null object
26    621 non-null object
27    621 non-null int64
28    621 non-null int64
29    621 non-null int64
30    621 non-null object
31    621 non-null int64
32    621 non-null object
33    621 non-null int64
34    621 non-null object
35    621 non-null int64
36    621 non-null object
37    621 non-null int64
38    621 non-null object
39    621 non-null object
40    621 non-null int64
41    621 non-null object
42    621 non-null int64
43    621 non-null int64
44    621 non-null object
45    621 non-null object
dtypes: int64(20), object(26)
memory usage: 223.2+ KB

candidatos2 = pd.read_csv("candidatos_2014/consulta_cand_2014_AL.txt",sep=';', header=None, encoding = 'latin_1') 
candidatos2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 479 entries, 0 to 478
Data columns (total 46 columns):
0     479 non-null object
1     479 non-null object
2     479 non-null int64
3     479 non-null int64
4     479 non-null object
5     479 non-null object
6     479 non-null object
7     479 non-null object
8     479 non-null int64
9     479 non-null object
10    479 non-null object
11    479 non-null int64
12    479 non-null int64
13    479 non-null int64
14    479 non-null object
15    479 non-null int64
16    479 non-null object
17    479 non-null int64
18    479 non-null object
19    479 non-null object
20    479 non-null int64
21    479 non-null object
22    479 non-null object
23    479 non-null object
24    479 non-null int64
25    479 non-null object
26    479 non-null object
27    479 non-null int64
28    479 non-null int64
29    479 non-null int64
30    479 non-null object
31    479 non-null int64
32    479 non-null object
33    479 non-null int64
34    479 non-null object
35    479 non-null int64
36    479 non-null object
37    479 non-null int64
38    479 non-null object
39    479 non-null object
40    479 non-null int64
41    479 non-null object
42    479 non-null int64
43    479 non-null int64
44    479 non-null object
45    479 non-null object
dtypes: int64(20), object(26)
memory usage: 172.2+ KB

candidatos3 = pd.read_csv("candidatos_2014/consulta_cand_2014_AM.txt",sep=';', header=None, encoding = 'latin_1') 
candidatos3.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 786 entries, 0 to 785
Data columns (total 46 columns):
0     786 non-null object
1     786 non-null object
2     786 non-null int64
3     786 non-null int64
4     786 non-null object
5     786 non-null object
6     786 non-null object
7     786 non-null object
8     786 non-null int64
9     786 non-null object
10    786 non-null object
11    786 non-null int64
12    786 non-null int64
13    786 non-null int64
14    786 non-null object
15    786 non-null int64
16    786 non-null object
17    786 non-null int64
18    786 non-null object
19    786 non-null object
20    786 non-null int64
21    786 non-null object
22    786 non-null object
23    786 non-null object
24    786 non-null int64
25    786 non-null object
26    786 non-null object
27    786 non-null int64
28    786 non-null int64
29    786 non-null int64
30    786 non-null object
31    786 non-null int64
32    786 non-null object
33    786 non-null int64
34    786 non-null object
35    786 non-null int64
36    786 non-null object
37    786 non-null int64
38    786 non-null object
39    786 non-null object
40    786 non-null int64
41    786 non-null object
42    786 non-null int64
43    786 non-null int64
44    786 non-null object
45    786 non-null object
dtypes: int64(20), object(26)
memory usage: 282.5+ KB

请问,有没有办法在一个数据框中同时加载这些文件?

或者我是否需要一次加载一个然后收集所有数据帧?怎么样?

2 个答案:

答案 0 :(得分:4)

在这种情况下,我喜欢提供pandas.concat列表理解。

from pathlib import Path
import pandas

def _reader(fname):
    return pandas.read_csv(fname, sep=';', header=None, encoding='latin_1')

folder = Path("candidatos_2014")
df = pandas.concat([
    _reader(txt)
    for txt in folder.glob("*.txt")
])

答案 1 :(得分:1)

您可以在创建数据帧后附加数据帧:

    candidatos1.append(candidatos2,ignore_index=True).append(candidatos3,ignore_index=True)

您可以先连接文本文件,然后加载到Pandas中,但这不在Pandas之外。