我有一个问题。我想遍历包含例如名称为“ usr666”,然后仅通过选定的列标题将它们加载到pandas数据框中,然后将它们合并为一个文件,如以下示例所示:
BT_usr666.csv:
number|size|person|car |
---------------------------
31 |2 |Ringo |Tesla |
82 |3 |Paul |Audi |
93 |2 |John |BMW |
74 |3 |George|MG |
RS_usr666.csv:
number|color|person|doors|car |
---------------------------------
33 |black|Mick |2 |Porsche|
12 |red |Keith |4 |Saab |
55 |blue |Ron |6 |Volvo |
into FINAL_usr666.csv
person|car |
---------------
Ringo |Tesla |
Paul |Audi |
John |BMW |
George|MG |
Mick |Porsche|
Keith |Saab |
Ron |Volvo |
有什么想法吗?
答案 0 :(得分:1)
这可以做到
这将在“。”中搜索文件。即当前目录并查找以usr666开头的文件,然后执行您要求的操作
import pandas as pd
import os
x=pd.DataFrame()
for filename in sorted(os.listdir(".")):
if filename.startswith("usr666"):
y=pd.read_csv(filename)
selected=y[["person","car"]]
x=x.append(selected)
x.to_csv('file1.csv',index=True)
答案 1 :(得分:1)
您可以尝试以下脚本。
代码
import glob
import os
import pandas as pd
def get_final_df(files):
df = pd.DataFrame()
your_columns = ['person', 'car']
for file in files:
temp_df = pd.read_csv(file, usecols = your_columns)
df = df.append(temp_df, ignore_index=True)
return df
if __name__ == '__main__':
wd = os.getcwd() # I've set this as working dir, you can change the path to your files.
files = [file for file in glob.glob(os.path.join(wd, '*')) if 'usr666' in file]
final_df = get_final_df(files)
final_df.to_csv('final_df.csv', index=False) # Write to file