目标站点: 前程无忧
需求数据:指定关键词的所有职位数据
要求:自动翻页并输出
选择模块: requests
分析:
关键词:pythonurl:https://search.51job.com/list/180000,000000,0000,00,9,99,python,2,1.html
代码:
import re import requests url = "https://search.51job.com/list/180000,000000,0000,00,9,99,python,2,1.html" hd={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36"} response = requests.get(url,headers=hd) # 如果网页出现乱码,通传编码方式 data = bytes(response.text,response.encoding).decode("gbk","ignore") # print(len(data)) pat_pag = "共(.*?)条职位" allline = re.compile(pat_pag,re.S).findall(data)[0] # print(allline) allpage = int(allline)//50 + 1 for i in range(0,allpage): print("------------正在爬"+str(i+1)+"页---------") url = "https://search.51job.com/list/180000,000000,0000,00,9,99,python,2,"+str(i+1)+".html" # print(url) response = requests.get(url, headers=hd) # 如果网页出现乱码,通传编码方式 thisdata = bytes(response.text, response.encoding).decode("gbk", "ignore") # print(thisdata) job_url_pat='<em class="check" name="delivery_em" onclick="checkboxClick.this."></em>.*?href="(.*?).html' job_url_all = re.compile(job_url_pat,re.S).findall(thisdata) # print(len(job_url_all)) for job_url in job_url_all: # print(job_url) thisurl=job_url+".html" response=requests.get(thisurl) thisdata=bytes(response.text,response.encoding).decode("gbk","ignore") pat_title='<h1 title="(.*?)"' pat_company='<p class="cname">.*?title="(.*?)"' pat_money='</h1><strong>(.*?)</strong>' pat_addr='上班地址:</span>(.*?)</p>' title = re.compile(pat_title,re.S).findall(thisdata)[0] company = re.compile(pat_company,re.S).findall(thisdata)[0] money = re.compile(pat_money,re.S).findall(thisdata)[0] try: addr = re.compile(pat_addr,re.S).findall(thisdata)[0] except IndexError: addr = "空" print("-------------------") print(title) print(company) print(money) print(addr)
I'm so cute. Please give me money.
- 本文链接:https://wentianhao.github.io/2020/03/01/%E6%8B%9B%E8%81%98%E4%BF%A1%E6%81%AF%E7%88%AC%E8%99%AB%E9%A1%B9%E7%9B%AE/
- 版权声明:本博客所有文章除特别声明外,均默认采用 许可协议。
若没有本文 Issue,您可以使用 Comment 模版新建。