我有两个csv文件,一个是带有标题的单个列,另一个是带有标题的多个列。我想在一个列文件中查找值,并在另一个文件中使用相同的标题搜索列。找到匹配后,我希望打印整行。我知道这是一个vlookup函数,但是有多列的csv文件非常大,当我尝试使用公式来实现这一点时总是崩溃excel。所以我一直在尝试使用python作为解决方案。
我正在接受这一栏:
age
23
43
18
搜索此表:
Name, age,number,AA,BB,CC,DD,EE
John, 23, 1, 34,35,36,37,38
Mary, 32, 2, 33,34,35,36,37
Jacob , 43, 3, 32,33,34,35,36
Matthew,22, 4, 31,32,33,34,35
Jean, 18, 5, 30,31,32,33,34
试图打印这个:
Name, age,number,AA,BB,CC,DD,EE
John, 23, 1, 34,35,36,37,38
Jacob , 43, 3, 32,33,34,35,36
Jean, 18, 5, 30,31,32,33,34
我一直在尝试使用这段代码,但我把所有内容混淆了,它只是将第一行打印为一列:
with open('/home/s/Untitled 1.csv') as f:
r=pandas.read_csv(f)
with open('/home/s/Test1.csv','r') as w:
x=pd.read_csv(w)
col=w['age']
for line in w:
for col in w:
for row in r:
if row in col:
print(line)
我基本上希望脚本使用查询列中的第一个条目来搜索数据表中具有相同标题的列并打印该行,循环显示下面行中的其余条目。
任何建议都将不胜感激!
答案 0 :(得分:2)
您的代码存在很多问题,这些问题表明您非常困惑。
with open('/home/s/Untitled 1.csv') as f:
r = pandas.read_csv(f)
with open('/home/s/Test1.csv','r') as w:
x = pandas.read_csv(w)
# w is not indexable
col = w['age']
for line in w:
# w is not a table.
for col in w:
for row in r:
if row in col:
print(line)
我认为如果我为你解决问题会有所帮助:
pandas
数据框pandas
数据框以查找年龄匹配这可以通过使用列表而不是数据框来完成。稍后您将能够看到原因。
ages = []
with open("incsv1.csv", "r") as f:
r = pandas.read_csv(f)
ages = list(r["age"])
你已经做到了:
with open("incsv2.csv", "r") as f:
x = pandas.read_csv(f)
既然您知道自己只是要浏览ages
列,只需将其编入索引并迭代它:
for i, age in enumerate(x["ages"]):
# You can't do this without a numpy int64
if age in ages:
print x.loc[i]
整个程序将输出:
Name John
age 23
number 1
AA 34
BB 35
CC 36
DD 37
EE 38
Name: 0, dtype: object
Name Jacob
age 43
number 3
AA 32
BB 33
CC 34
DD 35
EE 36
Name: 2, dtype: object
Name Jean
age 18
number 5
AA 30
BB 31
CC 32
DD 33
EE 34
Name: 4, dtype: object
现在,我知道你希望它以直线打印出来,所以我会告诉你我认为会更好的东西:
import pandas
ages = []
with open("incsv1.csv", "r") as f:
r = pandas.read_csv(f)
ages = list(r["age"])
with open("incsv2.csv", "r") as f:
# Skip the first line
f.readline()
for line in f:
if int(line.split(",")[1]) in ages:
print line,
正如您所看到的,在此问题中您并不需要pandas
。事实上,我可以删除它:
ages = []
with open("incsv1.csv", "r") as f:
# Skip the first line
f.readline()
for line in f:
ages.append(int(line.strip("\n")))
with open("incsv2.csv", "r") as f:
# Skip the first line
f.readline()
for line in f:
if int(line.split(",")[1]) in ages:
print line,