Question

使用Python 3，我正在尝试替换URL中的某个单词，该URL已放入具有相同URL的732行的Dataframe中。这是网址：http://dbarchive.biosciencedbc.jp/kyushu-u/hg19/eachData/bed20/**ID**.bed。

我有另一个包含732行不同实验ID的Dataframe。我希望能够用每个实验ID替换URL中的“ID”一词，这样我就可以获得一个更新的Dataframe，其中包含将.bed文件下载到Python中所需的732个URL中的每一个。

作为附注 - 从那里，是否可以将.bed文件下载到Python中而无需先通过浏览器保存它然后将其上传到Python中？

Answer 1

将map与str.format一起使用。

import random

# Setup
url = 'http://.../bed20/{}.bed' 

np.random.seed(0)
df = pd.DataFrame({'ID': np.random.choice(100, 5).astype(str)})

df['ID'].map(url.format)

0    http://.../bed20/44.bed
1    http://.../bed20/47.bed
2    http://.../bed20/64.bed
3    http://.../bed20/67.bed
4    http://.../bed20/67.bed
Name: ID, dtype: object

替换为您自己的网址和ID数据框。

或者，使用列表理解（在性能方面应该大致相同）......

[url.format(x) for x in df['ID']]    
# ['http://.../bed20/44.bed', 
#  'http://.../bed20/47.bed', 
#  'http://.../bed20/64.bed', 
#  'http://.../bed20/67.bed', 
#  'http://.../bed20/67.bed']

df.assign(ID=[url.format(x) for x in df['ID']])

                        ID
0  http://.../bed20/44.bed
1  http://.../bed20/47.bed
2  http://.../bed20/64.bed
3  http://.../bed20/67.bed
4  http://.../bed20/67.bed

Answer 2

我使用apply和format

fmt = 'http://dbarchive.biosciencedbc.jp/kyushu-u/hg19/eachData/bed20/{}.bed'
df.ID.apply(fmt.format)

使用pandas格式化包含列的值的字符串

2 个答案: