2017年首發
開始學Python爬蟲(Crawler)
系統環境:
windows 7
python3.52
pip 套件列表
requests、beautifulsoup4、jupyter
以下使用這個教學影片範本
影片是2014.6
作者用的python2.7
所以裡面使用的網址跟python語法稍有不同
以下就直接貼出我實作的程式碼:(2017/1/06)
import requests from bs4 import BeautifulSoup url='http://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los+Angeles%2C+CA' res=requests.get(url) soup=BeautifulSoup(res.content, 'html5lib') get_data=soup.find_all("div",{"class":"info"}) for p in get_data: print (p.contents[0].find_all("a",{"class":"business-name"})[0].text) try: print (p.contents[1].find_all("span",{"class":"street-address"})[0].text) except: pass try: print (p.contents[1].find_all("span",{"class":"locality"})[0].text.replcae(",","")) except: pass try: print (p.contents[1].find_all("span",{"itemprop":"addressRegion"})[0].text) except: pass try: print (p.contents[1].find_all("span",{"itemprop":"postalCode"})[0].text) except: pass try: print (p.contents[1].find_all("div",{"class":"phones phone primary"})[0].text) except: pass
後續:
1.程式只抓取第一頁的資料,可增加抓取其他頁面
2.程式可以放Github