我遇到了这个问题:
我有一个包含数据的sec_sic.csv
文件:
ticker SIC
A 3826
AA 3334
AAL 4512
AAN 7359
AAP 5531
我需要从sec_sic
中读取数据以将SIC
与SIC Code
在网站上进行比较(添加新列Office
和Industry
)并创建新的SIC文件所有新数据
我尝试了以下代码:
import pandas as pd
import requests
import csv
url = "https://www.sec.gov/info/edgar/siccodes.htm"
r = requests.get(url)
df_list = pd.read_html(r.text) # this parses all the tables in webpages to a list
df = df_list[0]
#df.set_index('SIC Code', inplace=True)
#print(df.head())
#print(df['Office'])
sic_num =0
base_df = pd.read_csv('sec_sic.csv')
with open("sec_sic_to_industry.csv", "w+", newline='',encoding='utf-8') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(["ticker", "SIC","office", "industry"])
print("Num of SIC "+str(len(base_df['SIC'])))
while sic_num <= len(base_df['SIC']): #len(base_df['SIC'])
#print(base_df['SIC'][0])
filt = (base_df['SIC'][sic_num] == df['SIC Code'])#df['SIC Code']
#print(df.loc[filt, ["Office", "Industry Title"]])
one = df.loc[filt, "Office"]
two = df.loc[filt, "Industry Title"]
one_1 = one.to_string()
two_1 = two.to_string()
#print(base_df['SIC'][sic_num])
one_2 = one_1.split(" ",1)[1]
two_2 = two_1.split(" ",1)[1]
SIC = base_df['SIC'][sic_num]
ticker = base_df['ticker'][sic_num]
with open("sec_sic_to_industry.csv", "a+",newline='', encoding='utf-8') as csvfile:
writer = csv.writer(csvfile)
writer.writerow([ticker,SIC,one_2, two_2])
sic_num +=1
但是我在最后的专栏行业对文本有疑问,有时还没有完成。
ticker SIC office industry
ALB 2821 Office of Life Sciences PLASTIC MATERIALS, SYNTH RESINS & NONVULCAN EL...
ALGN 3842 Office of Life Sciences ORTHOPEDIC, PROSTHETIC & SURGICAL APPLIANCES &...