代理IP的获取

网络爬虫攻防常见技巧 中有获取代理IP的几种方式。

使用HttpClient编写爬虫

第一步:导入pom依赖

   <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
            <version>4.5.3</version>
        </dependency>

第二步:指定代理IP的信息
示例:

        ips.add("186.91.160.120:8080");
        ips.add("36.74.165.99:8080");
        ips.add("185.129.202.2:53281");
        ips.add("37.238.61.177:8080");
        ips.add("180.246.56.226:8080");

第三步:编写爬虫

private static void dovote(String ip, int port) throws IOException, ClientProtocolException {
        //1.投票的请求接口
        String url = "http://fendou.itcast.cn/article/updatevote";
        //2.准备请求头的信息
        Map<String, String> headers = getHeader();
        //3.创建POST请求对象
        HttpPost httpPost = new HttpPost(url);
        //4.准备请求头的信息
        for (Map.Entry<String, String> header : headers.entrySet()) {
            httpPost.addHeader(header.getKey(), header.getValue());
        }
        //5.创建代理HTTP请求
        HttpHost proxy = new HttpHost(ip, port);
        ConnectionConfig connectionConfig = ConnectionConfig.custom().setBufferSize(4128).build();
        DefaultProxyRoutePlanner routePlanner = new DefaultProxyRoutePlanner(proxy);
        CloseableHttpClient hc = HttpClients.custom().setDefaultConnectionConfig(connectionConfig)
                .setRoutePlanner(routePlanner).build();
        //6.使用代理HttpClient发起投票请求
        CloseableHttpResponse res = hc.execute(httpPost);
        //7.打印http请求状态码
        System.out.println("statusCode:" + res.getStatusLine().getStatusCode());
        for (Header header : res.getAllHeaders()) {
            //8.打印所有的response header信息,发现有set-cookie的信息就成功了。
            System.out.println(header);
        }
        //9.打印html信息 如果返回为空字符串 就是投票成功。有返回值的基本就是失败了。
        String html = EntityUtils.toString(res.getEntity(), Charset.forName("utf-8"));
        System.out.println("返回值:" + html);
    }

httpClient代理案例-官方手册

官方代码如下:

HttpHost proxy = new HttpHost("someproxy", 8080);
DefaultProxyRoutePlanner routePlanner = new DefaultProxyRoutePlanner(proxy);
CloseableHttpClient httpclient = HttpClients.custom()
        .setRoutePlanner(routePlanner)
        .build();

标签: none

相关文章推荐

添加新评论,含*的栏目为必填