我可以使用下面的代码直接读取内部表。
url='https://s3.amazonaws.com/todel162/AAKAR.html'
df_one=pd.read_html(url, header=0, match='Number of Booked Apartment')[1]
df_two=pd.read_html(url, header=0, match='Number of Booked Apartment')[2]
但是如何将内部表与主表链接?对于例如上面提到的 df_one 框架链接到序列号1 (外部)。有没有办法提取外表,以便只选择序列号1和2?
更新
有一个名为“建筑细节”的部分。如果您访问该页面,您将看到如下的第一个序列号:
Sr.No. Project Name Name Proposed Date of Completion Number of Basement's Number of Plinth Number of Podium's Number of Slab of Super Structure Number of Stilts Number of Open Parking Number of Closed Parking
1 SRUSHTI COMPLEX A and B 0 1 0 5 1 48 1
第二个序列号是:
Sr.No. Project Name Name Proposed Date of Completion Number of Basement's Number of Plinth Number of Podium's Number of Slab of Super Structure Number of Stilts Number of Open Parking Number of Closed Parking
2 SRUSHTI COMPLEX C and D 0 1 0 5 1 51 1
df_one数据帧链接到Sr. No.1,而df_two链接到Sr. No. 2
我希望将Sr.No. 1和Sr. No. 2的列分别添加到df_one和df_two。
答案 0 :(得分:1)
Documentation表示在调用def CATEGORIES(LOGIN):
usrsettings = xbmcaddon.Addon(id='plugin.program.test')
use_account = usrsettings.getSetting('use-account')
if use_account == 'true':
#get username and password and do login with them
#also get whether to hid successful login notification
username = usrsettings.getSetting('username')
password = usrsettings.getSetting('password')
logged_in_string = doLogin(username, password) # do and check your login
if logged_in_string:
link = OPEN_URL('http://example.com').replace('\n','').replace('\r','')
match = re.compile('name="(.+?)".+?rl="(.+?)".+?mg="(.+?)".+?anart="(.+?)".+?escription="(.+?)"').findall(link)
for name,url,iconimage,fanart,description in match:
addDir(name,url,1,iconimage,fanart,description)
setView('movies', 'MAIN')
addDir('FRESH START','url',6,'','','')
后,您应该期待进行一些手动清理。我不确定如何将此代码扩展到您可能完全不同的htmls。话虽如此,这实现了你想要的吗?
pd.read_html()
完成后,您可以使用# Read df
df_other=pd.read_html(url, header=0, match='Number of Plinth')
# To keep only the targeted columns; have a look at df_other - it's cluttered.
targeted_columns = ['Sr.No.', 'Project Name', 'Name', 'Proposed Date of Completion',
'Number of Basement\'s', 'Number of Plinth', 'Number of Podium\'s',
'Number of Slab of Super Structure', 'Number of Stilts',
'Number of Open Parking', 'Number of Closed Parking']
# 'Project Name'=='SRUSHTI COMPLEX' is an easy way to extract the two dataframes of interest. Also resetting index and dropping.
df_other = df_other[0].loc[df_other[0]['Project Name']=='SRUSHTI COMPLEX',targeted_columns].reset_index(drop=True)
# This is useful for the merge step later since the Sr.No. in df_one and df_two int
df_other['Sr.No.'] = df_other['Sr.No.'].astype(int)
# Extract the two rows as dataframes that correspond to each frame you mentioned
df_other_one = df_other.iloc[[0]]
df_other_two = df_other.iloc[[1]]
加入数据框
merge