我想一次读取多个文件。我在两个文件中有数据,如下所示:
数据:
from tensorflow import *
数据1:
123.22.21.11,sid
112.112.11.1,john
110.11.23.23,jenny
122.23.21.13,ankit
我按照this链接尝试了几个答案。下面是我的代码:
145.123.11.1, Joaquin
我运行这段代码后,输出如下:
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join(" ", "/home/cloudera/Desktop/sample/*"))))
但是当我显示时,我需要如下所示以及不同列中的输出:
>>> df
123.22.21.11 145.123.11.1 Joaquin sid
0 112.112.11.1 NaN NaN NaN
1 110.11.23.23 NaN NaN NaN
2 122.23.21.13 NaN NaN NaN
0 112.112.11.1 NaN NaN john
1 110.11.23.23 NaN NaN jenny
2 122.23.21.13 NaN NaN ankit
那我该怎么办?
答案 0 :(得分:1)
您的问题是pd.read_csv()
在默认情况下需要列标题/名称。 Concat
使用它们进行匹配。我可以使用names=None
将kwarg "partial"
传递到map
中。
import glob
import os
import pandas as pd
from functools import partial
mapfunc = partial(pd.read_csv, header=None)
df = pd.concat(map(mapfunc, glob.glob(os.path.join(" ", "/home/cloudera/Desktop/sample/*"))))
输出:
0 1
0 123.22.21.11 sid
1 112.112.11.1 john
2 110.11.23.23 jenny
3 122.23.21.13 ankit
0 145.123.11.1 Joaquin
您可以在此处查看部分信息: Using map() function with keyword arguments
它不是很漂亮,但是您可以遍历目录并一次使用可变的“计数器”来一次处理“计数器”文件。
# Initialize Variables
fpath = "C:/Users/5188048/Desktop/example/"
interval = 5
filenames = []
# loop through files in directory
for i, j in enumerate(os.listdir(fpath)):
# append filenames to list, initialized previously
filenames.append(j)
# for every interval'th file, perform this...
if (i+1)%interval==0:
# use first file to initialize dataframe
temp_df = pd.read_csv(fpath+filenames[0], header=None)
# loop through remaining files
for file in filenames[1:]:
# concatenate additional files to dataframe
temp_df = pd.concat([temp_df, pd.read_csv(fpath+file, header=None)], ignore_index=True)
# do your manipulation here, example reset column names
temp_df.columns = ['IP_Address', 'Name']
# Generate outfile variable name & path
out_file = fpath+'out_file_' + str(int((i+1)/interval)) + '.csv'
# write outfile to csv
temp_df.to_csv(out_file, index=False)
# reset variable
filenames = []
else:
pass
答案 1 :(得分:1)
我认为将其分为几个步骤会更容易且更具可读性。您还想通过将onLogoutSuccess()
传递给header=None
来明确地告诉熊猫没有标题。
pd.read_csv