在Python 3和pandas中,我加载了几个TXT文件。它们没有标题并且具有相同的结构 - 每列中有46列和相同的信息主题 三例的例子
candidatos1 = pd.read_csv("candidatos_2014/consulta_cand_2014_AC.txt",sep=';', header=None, encoding = 'latin_1')
candidatos1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 621 entries, 0 to 620
Data columns (total 46 columns):
0 621 non-null object
1 621 non-null object
2 621 non-null int64
3 621 non-null int64
4 621 non-null object
5 621 non-null object
6 621 non-null object
7 621 non-null object
8 621 non-null int64
9 621 non-null object
10 621 non-null object
11 621 non-null int64
12 621 non-null int64
13 621 non-null int64
14 621 non-null object
15 621 non-null int64
16 621 non-null object
17 621 non-null int64
18 621 non-null object
19 621 non-null object
20 621 non-null int64
21 621 non-null object
22 621 non-null object
23 621 non-null object
24 621 non-null int64
25 621 non-null object
26 621 non-null object
27 621 non-null int64
28 621 non-null int64
29 621 non-null int64
30 621 non-null object
31 621 non-null int64
32 621 non-null object
33 621 non-null int64
34 621 non-null object
35 621 non-null int64
36 621 non-null object
37 621 non-null int64
38 621 non-null object
39 621 non-null object
40 621 non-null int64
41 621 non-null object
42 621 non-null int64
43 621 non-null int64
44 621 non-null object
45 621 non-null object
dtypes: int64(20), object(26)
memory usage: 223.2+ KB
candidatos2 = pd.read_csv("candidatos_2014/consulta_cand_2014_AL.txt",sep=';', header=None, encoding = 'latin_1')
candidatos2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 479 entries, 0 to 478
Data columns (total 46 columns):
0 479 non-null object
1 479 non-null object
2 479 non-null int64
3 479 non-null int64
4 479 non-null object
5 479 non-null object
6 479 non-null object
7 479 non-null object
8 479 non-null int64
9 479 non-null object
10 479 non-null object
11 479 non-null int64
12 479 non-null int64
13 479 non-null int64
14 479 non-null object
15 479 non-null int64
16 479 non-null object
17 479 non-null int64
18 479 non-null object
19 479 non-null object
20 479 non-null int64
21 479 non-null object
22 479 non-null object
23 479 non-null object
24 479 non-null int64
25 479 non-null object
26 479 non-null object
27 479 non-null int64
28 479 non-null int64
29 479 non-null int64
30 479 non-null object
31 479 non-null int64
32 479 non-null object
33 479 non-null int64
34 479 non-null object
35 479 non-null int64
36 479 non-null object
37 479 non-null int64
38 479 non-null object
39 479 non-null object
40 479 non-null int64
41 479 non-null object
42 479 non-null int64
43 479 non-null int64
44 479 non-null object
45 479 non-null object
dtypes: int64(20), object(26)
memory usage: 172.2+ KB
candidatos3 = pd.read_csv("candidatos_2014/consulta_cand_2014_AM.txt",sep=';', header=None, encoding = 'latin_1')
candidatos3.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 786 entries, 0 to 785
Data columns (total 46 columns):
0 786 non-null object
1 786 non-null object
2 786 non-null int64
3 786 non-null int64
4 786 non-null object
5 786 non-null object
6 786 non-null object
7 786 non-null object
8 786 non-null int64
9 786 non-null object
10 786 non-null object
11 786 non-null int64
12 786 non-null int64
13 786 non-null int64
14 786 non-null object
15 786 non-null int64
16 786 non-null object
17 786 non-null int64
18 786 non-null object
19 786 non-null object
20 786 non-null int64
21 786 non-null object
22 786 non-null object
23 786 non-null object
24 786 non-null int64
25 786 non-null object
26 786 non-null object
27 786 non-null int64
28 786 non-null int64
29 786 non-null int64
30 786 non-null object
31 786 non-null int64
32 786 non-null object
33 786 non-null int64
34 786 non-null object
35 786 non-null int64
36 786 non-null object
37 786 non-null int64
38 786 non-null object
39 786 non-null object
40 786 non-null int64
41 786 non-null object
42 786 non-null int64
43 786 non-null int64
44 786 non-null object
45 786 non-null object
dtypes: int64(20), object(26)
memory usage: 282.5+ KB
请问,有没有办法在一个数据框中同时加载这些文件?
或者我是否需要一次加载一个然后收集所有数据帧?怎么样?
答案 0 :(得分:4)
在这种情况下,我喜欢提供pandas.concat
列表理解。
from pathlib import Path
import pandas
def _reader(fname):
return pandas.read_csv(fname, sep=';', header=None, encoding='latin_1')
folder = Path("candidatos_2014")
df = pandas.concat([
_reader(txt)
for txt in folder.glob("*.txt")
])
答案 1 :(得分:1)
您可以在创建数据帧后附加数据帧:
candidatos1.append(candidatos2,ignore_index=True).append(candidatos3,ignore_index=True)
您可以先连接文本文件,然后加载到Pandas中,但这不在Pandas之外。