做爬虫的时候,离不开代理ip,自己也抓取过免费代理,进行维护;说实在的,真不行,我抓取10个网站的免费代理,进行清洗,满足要求的就20个左右,非常不稳定,测试代理时还是能用,用来请求时就失效了,有的请求2次就废了;这种代理,自己学习如何维护代理还可以;爬取数据不行,生产环境更不行; 爬虫还是要继续的,那就买代理,目前我用过芝麻代理,阿布云,讯代理;都比较好用,芝麻代理每天都有免费的20个ip,这免费的20个ip,比自己抓的可是要好很多;买的代理相对稳定,但还是达不到要求,其实买的代理都是共用的,专用的太贵;现在只能搭建代理服务器;
ADSL拨号的好处
- ADSL拨号就是我们常见的拨号上网,家里的宽带都用过的话,就能明白,每次拨号运行商就会重新分配一个新的ip,其实我们的ipv4资源是有限的,做不到每个人拥有独立ip,等ipv6普及了,每台电脑都会拥有独立的ip,很期待; 有这个特性,只要将这台服务器配置成代理服务器,获取ip存进远端的服务器上,爬虫从远端服务器上获取动态ip作为代理,就可以爬取数据;问题就是每次拨号需要先停止再启动,中间有空白期,这个期间代理是用不了.
- 开始ADSL拨号服务器搭建
购买一个月vps服务器,选择如下图;如果需要使用拨号ip作为代理使用,不要选择混合拨号,要选择单个城市;
选择购买一个月,系统Centos,个人喜欢linux
进入会员中心-云主机管理,点击控制面板,进入
选择需要安装的系统,点击马上开始安装系统
安装提示
ssh连接后,是没有网的ping不通百度的,需要启动adsl;命令adsl-start
adsl停止命令:adsl-stop;重新启动ip就会发生变化.
- 以上就是vps动态拨号,每次拨号都会分配新的ip地址,下来就是如何做成代理ip
第一种: Squid
- 配置代理服务器,启动adsl拨号:
adsl-start #启动
- 安装squid和httpd
yum install -y squid
yum install httpd-tools -y
yum install openssl
- 修改配置文件
vi /etc/squid/squid.conf
将里面的内容全部替换成下面的内容,直接复制替换掉就ok
https://github.com/Derek520/ADSL/blob/master/squid.conf
- 生成用户名和密码
htpasswd -c /etc/squid/passwd testadsl #创建一个密码文件名为passwd,账号名为airoot的密码文件
# 回车之后提示输入密码,在此这里我设置的密码为 321654
# 注意密码不要超过8位
- 检查参数
squid -k parse
- 初始化参数
squid -z
2018/11/07 00:09:11| WARNING: (B) '127.0.0.1' is a subnetwork of (A) '127.0.0.1'
2018/11/07 00:09:11| WARNING: because of this '127.0.0.1' is ignored to keep splay tree searching predictable
2018/11/07 00:09:11| WARNING: You should probably remove '127.0.0.1' from the ACL named 'localhost'
2018/11/07 00:09:11| WARNING: (B) '127.0.0.1' is a subnetwork of (A) '127.0.0.1'
2018/11/07 00:09:11| WARNING: because of this '127.0.0.1' is ignored to keep splay tree searching predictable
2018/11/07 00:09:11| WARNING: You should probably remove '127.0.0.1' from the ACL named 'localhost'
2018/11/07 00:09:11| WARNING: (B) '::1' is a subnetwork of (A) '::1'
2018/11/07 00:09:11| WARNING: because of this '::1' is ignored to keep splay tree searching predictable
2018/11/07 00:09:11| WARNING: You should probably remove '::1' from the ACL named 'localhost'
2018/11/07 00:09:11| WARNING: (B) '::1' is a subnetwork of (A) '::1'
2018/11/07 00:09:11| WARNING: because of this '::1' is ignored to keep splay tree searching predictable
2018/11/07 00:09:11| WARNING: You should probably remove '::1' from the ACL named 'localhost'
2018/11/07 00:09:11| WARNING: (B) '127.0.0.0/8' is a subnetwork of (A) '127.0.0.0/8'
2018/11/07 00:09:11| WARNING: because of this '127.0.0.0/8' is ignored to keep splay tree searching predictable
2018/11/07 00:09:11| WARNING: You should probably remove '127.0.0.0/8' from the ACL named 'to_localhost'
2018/11/07 00:09:11| WARNING: (B) '0.0.0.0' is a subnetwork of (A) '0.0.0.0'
2018/11/07 00:09:11| WARNING: because of this '0.0.0.0' is ignored to keep splay tree searching predictable
2018/11/07 00:09:11| WARNING: You should probably remove '0.0.0.0' from the ACL named 'to_localhost'
2018/11/07 00:09:11| WARNING: (B) '0.0.0.0' is a subnetwork of (A) '0.0.0.0'
2018/11/07 00:09:11| WARNING: because of this '0.0.0.0' is ignored to keep splay tree searching predictable
2018/11/07 00:09:11| WARNING: You should probably remove '0.0.0.0' from the ACL named 'to_localhost'
2018/11/07 00:09:11| WARNING: (B) '::1' is a subnetwork of (A) '::1'
2018/11/07 00:09:11| WARNING: because of this '::1' is ignored to keep splay tree searching predictable
2018/11/07 00:09:11| WARNING: You should probably remove '::1' from the ACL named 'to_localhost'
2018/11/07 00:09:11| WARNING: (B) '::1' is a subnetwork of (A) '::1'
2018/11/07 00:09:11| WARNING: because of this '::1' is ignored to keep splay tree searching predictable
2018/11/07 00:09:11| WARNING: You should probably remove '::1' from the ACL named 'to_localhost'
[root@localhost ~]# 2018/11/07 00:09:12 kid1| WARNING: (B) '127.0.0.1' is a subnetwork of (A) '127.0.0.1'
2018/11/07 00:09:12 kid1| WARNING: because of this '127.0.0.1' is ignored to keep splay tree searching predictable
2018/11/07 00:09:12 kid1| WARNING: You should probably remove '127.0.0.1' from the ACL named 'localhost'
2018/11/07 00:09:12 kid1| WARNING: (B) '127.0.0.1' is a subnetwork of (A) '127.0.0.1'
2018/11/07 00:09:12 kid1| WARNING: because of this '127.0.0.1' is ignored to keep splay tree searching predictable
2018/11/07 00:09:12 kid1| WARNING: You should probably remove '127.0.0.1' from the ACL named 'localhost'
2018/11/07 00:09:12 kid1| WARNING: (B) '::1' is a subnetwork of (A) '::1'
2018/11/07 00:09:12 kid1| WARNING: because of this '::1' is ignored to keep splay tree searching predictable
2018/11/07 00:09:12 kid1| WARNING: You should probably remove '::1' from the ACL named 'localhost'
2018/11/07 00:09:12 kid1| WARNING: (B) '::1' is a subnetwork of (A) '::1'
2018/11/07 00:09:12 kid1| WARNING: because of this '::1' is ignored to keep splay tree searching predictable
2018/11/07 00:09:12 kid1| WARNING: You should probably remove '::1' from the ACL named 'localhost'
2018/11/07 00:09:12 kid1| WARNING: (B) '127.0.0.0/8' is a subnetwork of (A) '127.0.0.0/8'
2018/11/07 00:09:12 kid1| WARNING: because of this '127.0.0.0/8' is ignored to keep splay tree searching predictable
2018/11/07 00:09:12 kid1| WARNING: You should probably remove '127.0.0.0/8' from the ACL named 'to_localhost'
2018/11/07 00:09:12 kid1| WARNING: (B) '0.0.0.0' is a subnetwork of (A) '0.0.0.0'
2018/11/07 00:09:12 kid1| WARNING: because of this '0.0.0.0' is ignored to keep splay tree searching predictable
2018/11/07 00:09:12 kid1| WARNING: You should probably remove '0.0.0.0' from the ACL named 'to_localhost'
2018/11/07 00:09:12 kid1| WARNING: (B) '0.0.0.0' is a subnetwork of (A) '0.0.0.0'
2018/11/07 00:09:12 kid1| WARNING: because of this '0.0.0.0' is ignored to keep splay tree searching predictable
2018/11/07 00:09:12 kid1| WARNING: You should probably remove '0.0.0.0' from the ACL named 'to_localhost'
2018/11/07 00:09:12 kid1| WARNING: (B) '::1' is a subnetwork of (A) '::1'
2018/11/07 00:09:12 kid1| WARNING: because of this '::1' is ignored to keep splay tree searching predictable
2018/11/07 00:09:12 kid1| WARNING: You should probably remove '::1' from the ACL named 'to_localhost'
2018/11/07 00:09:12 kid1| WARNING: (B) '::1' is a subnetwork of (A) '::1'
2018/11/07 00:09:12 kid1| WARNING: because of this '::1' is ignored to keep splay tree searching predictable
2018/11/07 00:09:12 kid1| WARNING: You should probably remove '::1' from the ACL named 'to_localhost'
2018/11/07 00:09:12 kid1| Set Current Directory to /var/spool/squid
2018/11/07 00:09:12 kid1| Creating missing swap directories
2018/11/07 00:09:12 kid1| Making directories in /tmp/squid/00
2018/11/07 00:09:12 kid1| Making directories in /tmp/squid/01
2018/11/07 00:09:12 kid1| Making directories in /tmp/squid/02
2018/11/07 00:09:12 kid1| Making directories in /tmp/squid/03
2018/11/07 00:09:12 kid1| Making directories in /tmp/squid/04
2018/11/07 00:09:12 kid1| Making directories in /tmp/squid/05
2018/11/07 00:09:12 kid1| Making directories in /tmp/squid/06
2018/11/07 00:09:12 kid1| Making directories in /tmp/squid/07
2018/11/07 00:09:12 kid1| Making directories in /tmp/squid/08
2018/11/07 00:09:12 kid1| Making directories in /tmp/squid/09
2018/11/07 00:09:12 kid1| Making directories in /tmp/squid/0A
2018/11/07 00:09:12 kid1| Making directories in /tmp/squid/0B
2018/11/07 00:09:12 kid1| Making directories in /tmp/squid/0C
2018/11/07 00:09:12 kid1| Making directories in /tmp/squid/0D
2018/11/07 00:09:12 kid1| Making directories in /tmp/squid/0E
2018/11/07 00:09:12 kid1| Making directories in /tmp/squid/0F
- 以上就是初始化成功了,会卡在那里,直接ctrl+c退出就可以
- 启动服务
# 启动
systemctl start squid.service
# 停止
systemctl stop squid.service
# 重启
systemctl restart squid.service
启动若报错,提示:Redirecting to /bin/systemctl restart squid.service
/bin/systemctl restart squid.service
- 测试搭建的代理服务器
ifconfig
网卡:ppp0 ip xxx.xxx.xxx.xxx airoot:123456是第四步生成的用户和密码
curl -x xxx.xxx.xxx.xxx:3828 -U airoot:123456 www.baidu.com
第二种:TinyProxy
- 这个很简单,直接安装
yum install -y epel-release
yum update -y
yum install -y tinyproxy
- 配置TinyProxy
vim /etc/tinyproxy/tinyproxy.conf
#端口
Port 8888
# 注释掉是允许所有ip访问
#Allow 127.0.0.1
# 注释掉下面这行,隐藏掉Via请求头部
DisableViaHeader Yes
- 更多配置项,下面是列举一些配置文件默认的,不需要配置:
PidFile "/var/run/tinyproxy/tinyproxy.pid"
LogFile "/var/log/tinyproxy/tinyproxy.log"
LogLevel Info
MaxClients 100
MinSpareServers 5
MaxSpareServers 20
StartServers 10
- 命令,启动即可:
systemctl start tinyproxy.service
systemctl restart tinyproxy.service
systemctl stop tinyproxy.service
systemctl status tinyproxy.service
systemctl enable tinyproxy.service
- 测试
curl -x xxx.xxx.xxx.xxx:3828 www.baidu.com