仍然是python的新手,第一次使用.replace,我遇到了一个奇怪的问题。
url_base = 'http://sfbay.craigslist.org/search/eby/apa'
params = dict(bedrooms=1, is_furnished=1)
rsp = requests.get(url_base, params=params)
# BS4 can quickly parse our text, make sure to tell it that you're giving html
html = bs4(rsp.text, 'html.parser')
# BS makes it easy to look through a document
#print(html.prettify()[:1000])
# BS4 can quickly parse our text, make sure to tell it that you're giving html
html = bs4(rsp.text, 'html.parser')
# BS makes it easy to look through a document
print(html.prettify()[:1000])
# find_all will pull entries that fit your search criteria.
# Note that we have to use brackets to define the `attrs` dictionary
# Because "class" is a special word in python, so we need to give a string.
apts = html.find_all('p', attrs={'class': 'row'})
print(len(apts))
# We can see that there's a consistent structure to a listing.
# There is a 'time', a 'name', a 'housing' field with size/n_brs, etc.
this_appt = apts[15]
print(this_appt.prettify())
# So now we'll pull out a couple of things we might be interested in:
# It looks like "housing" contains size information. We'll pull that.
# Note that `findAll` returns a list, since there's only one entry in
# this HTML, we'll just pull the first item.
size = this_appt.findAll(attrs={'class': 'housing'})[0].text
print(size) , 'this is the size'
def find_size_and_brs(size):
split = size.strip('/- ').split(' - ')
print len(split)
if 'br' in split[0] and 'ft2' in split[0]:
print 'We made it into 1'
n_brs = split[0].replace('br -', '',)
this_size = split[0].replace('ft2 -', '')
elif 'br' in split[0]:
print 'we are in 2'
# It's the n_bedrooms
n_brs = split[0].replace('br', '')
this_size = np.nan
elif 'ft2' in split[0]:
print 'we are in 3'
# It's the size
this_size = split[0].replace('ft2', '')
n_brs = np.nan
print n_brs
print this_size
return float(this_size), float(n_brs)
this_size, n_brs = find_size_and_brs(size)
输出:
We made it into 1
1
800ft2 -
1br -
800
我无法弄清楚为什么会打印两次数据,每次数据点替换一次数据。
思考? 感谢
答案 0 :(得分:0)
现在适合我。我使用strip
,split
进行了一些修改,并添加了评论# <- here
url_base = 'http://sfbay.craigslist.org/search/eby/apa'
params = dict(bedrooms=1, is_furnished=1)
rsp = requests.get(url_base, params=params)
# BS4 can quickly parse our text, make sure to tell it that you're giving html
html = bs4(rsp.text, 'html.parser')
# BS makes it easy to look through a document
#print(html.prettify()[:1000])
# BS4 can quickly parse our text, make sure to tell it that you're giving html
html = bs4(rsp.text, 'html.parser')
# BS makes it easy to look through a document
#print(html.prettify()[:1000])
# find_all will pull entries that fit your search criteria.
# Note that we have to use brackets to define the `attrs` dictionary
# Because "class" is a special word in python, so we need to give a string.
apts = html.find_all('p', attrs={'class': 'row'})
#print(len(apts))
# We can see that there's a consistent structure to a listing.
# There is a 'time', a 'name', a 'housing' field with size/n_brs, etc.
this_appt = apts[15]
#print(this_appt.prettify())
# So now we'll pull out a couple of things we might be interested in:
# It looks like "housing" contains size information. We'll pull that.
# Note that `findAll` returns a list, since there's only one entry in
# this HTML, we'll just pull the first item.
size = this_appt.findAll(attrs={'class': 'housing'})[0].text
#print(size) , 'this is the size'
def find_size_and_brs(size):
split = size.strip().split(' - ') # <- here strip()
#print len(split)
if 'br' in split[0] and 'ft2' in split[0]:
print 'We made it into 1'
two = split[0].split('\n') # <- here split()
n_brs = two[0].replace('br -', '',).strip() # <- here two[0] and strip()
this_size = two[1].replace('ft2 -', '').strip() # <- here two[1] and strip()
#print '>', n_brs, '<'
#print '>', this_size, '<'
elif 'br' in split[0]:
print 'we are in 2'
# It's the n_bedrooms
n_brs = split[0].replace('br', '')
this_size = np.nan
elif 'ft2' in split[0]:
print 'we are in 3'
# It's the size
this_size = split[0].replace('ft2', '')
n_brs = np.nan
print n_brs
print this_size
return float(this_size), float(n_brs)
this_size, n_brs = find_size_and_brs(size)
print '>', this_size, '<'
print '>', n_brs, '<'
PS。我在>
中使用<
,print
来查看空格。