2021-02-18
如何使用Python验证ip类型及是否有效?
ip地址分类由第一个八位组的值来肯定,任何一个0到127 间的网络地址均是一个a类地址。任何一个128到191间的网络地址是一个b类地址。任何一个192到223 间的网络地址是一个c类地址。任何一个第一个八位组在224到239 间的网络地址是一个组播地址即d类地址,e类保存。下面一起来学习对ip地址的验证处理。实现代码:import requestsimport randomimport timehttp_ip = ['118.163.13.200:8080','222.223.182.66:8000','51.158.186.242:8811','171.37.79.129:9797','139.255.123.194:4550']for i in range(10):try:ip_proxy = random.choice(http_ip)proxy_ip = {'http': ip_proxy,'https': ip_proxy,}print('使用代理的IP:', proxy_ip)response = requests.get("http://httpbin.org/ip", proxies=proxy_ip).textprint(response)print('当前IP有效')time.sleep(2)except Exception as e:print(e.args[0])print('当前IP无效')continue输出结果:使用代理的IP: {'http': '118.163.13.200:8080', 'https': '118.163.13.200:8080'}HTTPConnectionPool(host='118.163.13.200', port=8080): Max retries exceeded with url: http://httpbin.org/ip (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError(': Failed to establish a new connection: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。')))当前IP无效使用代理的IP: {'http': '51.158.186.242:8811', 'https': '51.158.186.242:8811'}{"origin": "51.158.186.242"}当前IP有效大家不妨也一起来尝试学习下哦~好啦,本章内容就到此结束了,希望可以帮助大家学习掌握!
2021-02-18
Python如何获取免费IP代理?
爬虫被封,实则上是爬虫触发了网站的“反爬虫”措施,导致爬虫的IP被限制。此时如果爬虫没有大量IP来做,是无法进行下去的。因此,想要获取免费的代理,必须要进行以下操作才可以。爬取的代码:# coding=utf-8import urllib2import re proxy_list = []total_proxy = 0 def get_proxy_ip(): global proxy_list global total_proxy request_list = [] headers = { 'Host': 'www.xicidaili.com', 'User-Agent': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)', 'Accept': r'application/json, text/javascript, */*; q=0.01', 'Referer': r'http://www.xicidaili.com/', } for i in range(3, 11): request_item = "http://www.xicidaili.com/nn/" + str(i) request_list.append(request_item) for req_id in request_list: req = urllib2.Request(req_id, headers=headers) response = urllib2.urlopen(req) html = response.read().decode('utf-8') ip_list = re.findall(r'\d+\.\d+\.\d+\.\d+', html) port_list = re.findall(r'<td>\d+</td>', html) for i in range(len(ip_list)): total_proxy += 1 ip = ip_list[i] port = re.sub(r'<td>|</td>', '', port_list[i]) proxy = '%s:%s' % (ip, port) proxy_list.append(proxy) return proxy_list def get_proxy_ip1(): global proxy_list global total_proxy request_list = [] headers = { 'Host': 'www.kuaidaili.com', 'User-Agent': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)', 'Accept': r'application/json, text/javascript, */*; q=0.01', 'Referer': r'www.kuaidaili.com/', } for i in range(1, 10): request_item = "https://www.kuaidaili.com/free/inha/" + str(i)+"/" request_list.append(request_item) for req_id in request_list: req = urllib2.Request(req_id, headers=headers) response = urllib2.urlopen(req) html = response.read().decode('utf-8') ip_list = re.findall(r'\d+\.\d+\.\d+\.\d+', html) port_list = re.findall(r'<td data-title="PORT">\d+</td>', html) for i in range(len(ip_list)): total_proxy += 1 ip = ip_list[i] port = re.findall(r'\d+', port_list[i])[0] proxy = '%s:%s' % (ip, port) proxy_list.append(proxy) return proxy_list def get_proxy_ip2(): global proxy_list global total_proxy request_list = [] headers = { 'Host': 'www.ip3366.net', 'User-Agent': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)', 'Accept': r'application/json, text/javascript, */*; q=0.01', 'Referer': r'www.ip3366.net/', } for i in range(1, 10): request_item = "http://www.ip3366.net/?stype=1&page=" + str(i) request_list.append(request_item) for req_id in request_list: req = urllib2.Request(req_id, headers=headers) response = urllib2.urlopen(req) html = response.read() ip_list = re.findall(r'\d+\.\d+\.\d+\.\d+', html) port_list = re.findall(r'<td>\d+</td>', html) for i in range(len(ip_list)): total_proxy += 1 ip = ip_list[i] port = re.sub(r'<td>|</td>', '', port_list[i]) proxy = '%s:%s' % (ip, port) proxy_list.append(proxy) return proxy_list if __name__=="__main__": get_proxy_ip() # get_proxy_ip1() get_proxy_ip2() print("获取ip数量为:" + total_proxy) 获取结果:使用代理访问网站的代码: proxy_ip = random.choice(proxy_list) user_agent = random.choice(user_agent_list) print proxy_ip print user_agent proxy_support = urllib2.ProxyHandler({'http': proxy_ip}) opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler) urllib2.install_opener(opener) req = urllib2.Request(url) req.add_header("User-Agent", user_agent) c = urllib2.urlopen(req, timeout=10)利用以上的步骤,就可以轻松实现Python获取免费IP代理,大家感兴趣的话,不妨尝试学习下哦~