enter image description here我创建了一个代码来帮助我从csv文件中检索数据
import re
keywords = {"metal", "energy", "team", "sheet", "solar" "financial", "transportation", "electrical", "scientists",
"electronic", "workers"} # all your keywords
keyre=re.compile("energy",re.IGNORECASE)
with open("2006-data-8-8-2016.csv") as infile:
with open("new_data.csv", "w") as outfile:
outfile.write(infile.readline()) # Save the header
for line in infile:
if len(keyre.findall(line))>0:
outfile.write(line)
我需要它在两个主要列中查找每个关键字,这些列是" position"和"职位描述" ,然后取出包含这些单词的整行并将其写入新文件中。有关如何以最简单的方式完成此任务的任何想法?
答案 0 :(得分:0)
尝试此操作,在数据帧中循环并将新数据帧写回csv文件。
import pandas as pd
keywords = {"metal", "energy", "team", "sheet", "solar", "financial",
"transportation", "electrical", "scientists",
"electronic", "workers"} # all your keywords
df = pd.read_csv("2006-data-8-8-2016.csv", sep=",")
listMatchPosition = []
listMatchDescription = []
for i in range(len(df.index)):
if any(x in df['position'][i] or x in df['Job description'][i] for x in keywords):
listMatchPosition.append(df['position'][i])
listMatchDescription.append(df['Job description'][i])
output = pd.DataFrame({'position':listMatchPosition, 'Job description':listMatchDescription})
output.to_csv("new_data.csv", index=False)
编辑: 如果要添加许多列,则修改后的代码将完成此任务。
df = pd.read_csv("2006-data-8-8-2016.csv", sep=",")
output = pd.DataFrame(columns=df.columns)
for i in range(len(df.index)):
if any(x in df['position'][i] or x in df['Job description'][i] for x in keywords):
output.loc[len(output)] = [df[j][i] for j in df.columns]
output.to_csv("new_data.csv", index=False)
答案 1 :(得分:0)
如果要查找关键字列表中只包含一个单词的行,可以使用pandas执行此操作:
keywords = ["metal", "energy", "team", "sheet", "solar" "financial", "transportation", "electrical", "scientists",
"electronic", "workers"]
# read the csv data into a dataframe
# change "," to the data separator in your csv file
df = pd.read_csv("2006-data-8-8-2016.csv", sep=",")
# filter the data: keep only the rows that contain one of the keywords
# in the position or the Job description columns
df = df[df["position"].isin(keywords) | df["Job description"].isin(keywords)]
# write the data back to a csv file
df.to_csv("new_data.csv",sep=",", index=False)
如果您要查找行中的子字符串(例如在financial
中查看financial engineering
),则可以执行以下操作:
keywords = ["metal", "energy", "team", "sheet", "solar" "financial", "transportation", "electrical", "scientists",
"electronic", "workers"]
searched_keywords = '|'.join(keywords)
# read the csv data into a dataframe
# change "," to the data separator in your csv file
df = pd.read_csv("2006-data-8-8-2016.csv", sep=",")
# filter the data: keep only the rows that contain one of the keywords
# in the position or the Job description columns
df = df[df["position"].str.contains(searched_keywords) | df["Job description"].str.contains(searched_keywords)]
# write the data back to a csv file
df.to_csv("new_data.csv",sep=",", index=False)