[心得] Python爬蟲入門

2017年首發

開始學Python爬蟲(Crawler)

系統環境:

windows 7
python3.52

pip 套件列表
requests、beautifulsoup4、jupyter

以下使用這個教學影片範本

影片是2014.6

作者用的python2.7

所以裡面使用的網址跟python語法稍有不同

以下就直接貼出我實作的程式碼:(2017/1/06)

import requests
from bs4 import BeautifulSoup

url='http://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los+Angeles%2C+CA'

res=requests.get(url)

soup=BeautifulSoup(res.content, 'html5lib')

get_data=soup.find_all("div",{"class":"info"})

for p in get_data:
 print (p.contents[0].find_all("a",{"class":"business-name"})[0].text)
 
 try:
  print (p.contents[1].find_all("span",{"class":"street-address"})[0].text)
 except:
  pass
 
 try:
  print (p.contents[1].find_all("span",{"class":"locality"})[0].text.replcae(",",""))
 except:
  pass 
 
 try:
  print (p.contents[1].find_all("span",{"itemprop":"addressRegion"})[0].text)
 except:
  pass
 
 try:
  print (p.contents[1].find_all("span",{"itemprop":"postalCode"})[0].text)
 except:
  pass
 
 try:
  print (p.contents[1].find_all("div",{"class":"phones phone primary"})[0].text)
 except:
  pass

後續:

1.程式只抓取第一頁的資料,可增加抓取其他頁面

2.程式可以放Github