我有以下file.txt(删节):
SICcode Catcode Category SICname MultSIC
0111 A1500 Wheat, corn, soybeans and cash grain Wheat X
0112 A1600 Other commodities (incl rice, peanuts) Rice X
0115 A1500 Wheat, corn, soybeans and cash grain Corn X
0116 A1500 Wheat, corn, soybeans and cash grain Soybeans X
0119 A1500 Wheat, corn, soybeans and cash grain Cash grains, NEC X
0131 A1100 Cotton Cotton X
0132 A1300 Tobacco & Tobacco products Tobacco X
我在将它读成熊猫df时遇到了一些问题。我尝试使用以下规范pd.read_csv
engine='python', sep='Tab'
,但它将文件返回到一列:
SICcode Catcode Category SICname MultSIC
0 0111 A1500 Wheat, corn, soybeans...
1 0112 A1600 Other commodities (in...
2 0115 A1500 Wheat, corn, soybeans...
3 0116 A1500 Wheat, corn, soybeans...
然后我尝试使用'tab'作为分隔符将其放入一个gnumeric文件中,但它将该文件作为一列读取。有没有人对此有所了解?
答案 0 :(得分:3)
如果df = pd.read_csv('file.txt', sep='\t')
返回包含一列的DataFrame,则显然file.txt
未使用制表符作为分隔符。您的数据可能只是将空格作为分隔符。在这种情况下,您可以尝试
df = pd.read_csv('data', sep=r'\s{2,}')
使用正则表达式模式\s{2,}
作为分隔符。此正则表达式匹配2个或更多的空格字符。
In [8]: df
Out[8]:
SICcode Catcode Category SICname \
0 111 A1500 Wheat, corn, soybeans and cash grain Wheat
1 112 A1600 Other commodities (incl rice, peanuts) Rice
2 115 A1500 Wheat, corn, soybeans and cash grain Corn
3 116 A1500 Wheat, corn, soybeans and cash grain Soybeans
4 119 A1500 Wheat, corn, soybeans and cash grain Cash grains, NEC
5 131 A1100 Cotton Cotton
6 132 A1300 Tobacco & Tobacco products Tobacco
MultSIC
0 X
1 X
2 X
3 X
4 X
5 X
6 X
如果这不起作用,请发布print(repr(open(file.txt, 'rb').read(100))
。这将向我们展示file.txt
的前100个字节的明确表示。
答案 1 :(得分:1)
如果sep="\t"
中的数据由csv
分隔,我认为您可以尝试将Tabulator
添加到read_csv
。
import pandas as pd
df = pd.read_csv('test/a.csv', sep="\t")
print df
SICcode Catcode Category SICname \
0 111 A1500 Wheat, corn, soybeans and cash grain Wheat
1 112 A1600 ther commodities (incl rice, peanuts) Rice
2 115 A1500 Wheat, corn, soybeans and cash grain Corn
3 116 A1500 Wheat, corn, soybeans and cash grain Soybeans
4 119 A1500 Wheat, corn, soybeans and cash grain Cash grains, NEC
5 131 A1100 Cotton Cotton
6 132 A1300 Tobacco & Tobacco products Tobacco
MultSIC
0 X
1 X
2 X
3 X
4 X
5 X
6 X