获取起点中文网的小说(一)-抓取起点的小说内容

对于普通用户来讲,起点中文网的小说是无法通过鼠标操作进行复制粘贴的。
略微懂一些的网页原理的同学,可能知道右键查看源代码。但是起点小说网,无法执行右键。
所以如何获取起点中文网的小说内容,就成了普通用户的一个问题?

接下来我们通过技术手段,来获取下起点小说的内容。

创建一个Java的Maven工程,导入依赖

 <dependencies>
        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
            <version>4.5.3</version>
        </dependency>
        <dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.10.3</version>
        </dependency>
    </dependencies>

编写代码

public static void main(String[] args)  {
        //1、set start url 你在起点上任意找一篇小说,第一章的URL
         String nextUrl = "https://read.qidian.com/chapter/_AaqI-dPJJ4uTkiRw_sFYA2/TPYBmAARksLgn4SMoDUcDQ2";
        while (nextUrl!=null) {
            try {
                //2、httpclient
                CloseableHttpClient httpClient = HttpClients.createDefault();
                //3.execute
                CloseableHttpResponse response = httpClient.execute(new HttpGet(nextUrl));
                //4.get html dcoment
                String html = EntityUtils.toString(response.getEntity());
                //5.parse content and next url
                Document document = Jsoup.parse(html);
                //5.1 parse content
                Elements contents = document.select("[class=read-content j_readContent]");
                System.out.println(contents.text());
                System.out.println("------------------------------");
                Elements nextUrls = document.select("#j_chapterNext");
                //5.2 parser next url
                nextUrl = "http:" + nextUrls.get(0).attr("href");
                Thread.sleep(2 * 1000);
            }catch (Exception e){
                System.out.println(nextUrl);
                System.out.println(e);
            }
        }
    }

起点小说网VIP章节获取

获取vip章节需要在浏览器上登录下,然后拷贝登录的信息,个人猜测主要是cookies信息。
当然vip章节肯定是你订阅过的。

 public static void main(String[] args) throws IOException, InterruptedException {
        //1、set start url
        String nextUrl = "http://vipreader.qidian.com/chapter/1004608738/346953690";
        //2、httpclient
        CloseableHttpClient httpClient = HttpClients.createDefault();
        //3.execute
        // set user login info
        HttpGet httpGet = new HttpGet(nextUrl);
        httpGet.setHeader("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
        httpGet.setHeader("Accept-Encoding","gzip, deflate, sdch");
        httpGet.setHeader("Accept-Language","zh-CN,zh;q=0.8");
        httpGet.setHeader("Connection","keep-alive");
        httpGet.setHeader("Cookie","e1=%7B%22pid%22%3A%22qd_P_vipread%22%2C%22eid%22%3A%22%22%2C%22l1%22%3A3%7D; e2=%7B%22pid%22%3A%22qd_P_vipread%22%2C%22eid%22%3A%22%22%2C%22l1%22%3A3%7D; _csrfToken=3ghtgsa7WWP84kbVUr2cYib4JNSViebbmuNPTmQd; newstatisticUUID=1508327144_264602072; qdrs=0%7C3%7C0%7C0%7C1; qdgd=1; bc=1004608738; pageOps=1; e1=%7B%22pid%22%3A%22qd_P_qdlogin%22%2C%22eid%22%3A%22%22%7D; e2=%7B%22pid%22%3A%22qd_P_qdlogin%22%2C%22eid%22%3A%22%22%7D; ywkey=yw4mUkcoXq2z; ywguid=800161839511; lrbc=1004608738%7C346953260%7C1; rcr=1004608738");
        httpGet.setHeader("Host","vipreader.qidian.com");
        httpGet.setHeader("Upgrade-Insecure-Requests","1");
        httpGet.setHeader("User-Agent","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36");
        CloseableHttpResponse response = httpClient.execute(httpGet);
        //4.get html dcoment
        String html = EntityUtils.toString(response.getEntity());
        //5.parse content and next url
        Document document = Jsoup.parse(html);
        //5.1 parse content
        Elements contents = document.select("[class=read-content j_readContent]");
        System.out.println(contents.text());
        System.out.println("------------------------------");
        Elements nextUrls = document.select("#j_chapterNext");
        //5.2 parse url
        nextUrl = "http:" + nextUrls.get(0).attr("href");
        System.out.println(nextUrl);
        Thread.sleep(2 * 1000);
    }

标签: none

相关文章推荐

已有 2 条评论

  1. ctrl+U 和 F12 它都通过JS监听了吗?不可能吧~ 他们有这么无聊吗!

    边琪 回复
    1. 刚试了下,ctrl+U 没有被禁用。 F12是非IE浏览器自带的工具,禁不掉。

      毛祥溢 回复
  2. 看了那么多年的小说,才知道自己看的小说怎么来的。所以那些防盗章节是因为盗版网站没有二次抓取更改内容.....是不是暴露了什么....

    周健钧 回复

添加新评论,含*的栏目为必填