如何解决如何使用 jsoub 或任何其他方式从网站获取完整的 html 代码
我正在尝试从网站获取 html 代码,如果网站代码像这样小:(https://abdelftahzowail.github.io/WriteUpsideDown/) 我得到完整代码但如果网站代码像这样大:({{ 3}}) 我没有得到完整的代码
我尝试了 Jsoup
和 HttpURLConnection
但没有给我完整的代码
这是我的代码
Thread thread = new Thread(() -> {
try {
Document doc;
doc = Jsoup.connect(editText.getText().toString())
.header("Accept-Encoding","gzip,deflate")
.userAgent("Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/30.0.1599.69 Safari/537.36")
.maxBodySize(0)
.timeout(0)
.get();
Log.i("IMPORTANT !!!!","doc ( "+editText.getText().toString()+" )\n"+doc);
} catch (Exception e) {
Log.i("IMPORTANT !!!!","error : "+e);
}
});
thread.start();
这是我从这个网站 (https://www.pixel4k.com/page/1?s=deadpool) 得到的代码
<!doctype html>
<html class="no-js" lang="en-US" prefix="og: http://ogp.me/ns#">
<head>
<Meta charset="UTF-8">
<title>You searched for deadpool - 4k Wallpapers,Hd Wallpapers,Desktop Wallpapers,Free Backgrounds Download,Widescreen Wallpapers</title>
<link rel="icon" href="https://www.pixel4k.com/wp-content/uploads/2018/09/favicon.ico" type="image/x-icon">
<link rel="apple-touch-icon" href="apple-touch-icon.png">
<Meta name="viewport" content="width=device-width,initial-scale=1.0">
<Meta name="apple-mobile-web-app-capable" content="yes">
<Meta name="apple-mobile-web-app-status-bar-style" content="black">
<link rel="stylesheet" type="text/css" media="all" href="https://www.pixel4k.com/wp-content/themes/pxxx/style.css">
<link rel="pingback" href="https://www.pixel4k.com/xmlrpc.PHP">
<Meta name="google-site-verification" content="xHAo1q6wJG7bz-iw00VylrwaMabFjK_xSyU1jakgwaQ">
<Meta name="wot-verification" content="317f71c46e1fb6060ce1">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js" type="f8f50ad6803275492fa5ce1d-text/javascript"></script>
<script type="f8f50ad6803275492fa5ce1d-text/javascript">(adsbygoogle=window.adsbygoogle||[]).push({google_ad_client:"ca-pub-2555268506534283",enable_page_level_ads:true});</script> <!--[if lt IE 9]>
<script src="https://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<Meta name="robots" content="noindex,follow">
<link rel="next" href="https://www.pixel4k.com/search/deadpool/page/2">
<Meta property="og:locale" content="en_US">
<Meta property="og:type" content="object">
<Meta property="og:title" content="You searched for deadpool - 4k Wallpapers,Widescreen Wallpapers">
<Meta property="og:url" content="https://www.pixel4k.com/search/deadpool">
<Meta property="og:site_name" content="4k Wallpapers,Widescreen Wallpapers">
<Meta name="twitter:card" content="summary_large_image">
<Meta name="twitter:title" content="You searched for deadpool - 4k Wallpapers,Widescreen Wallpapers">
<script type="application/ld+json">{"@context":"https:\/\/schema.org","@type":"Person","url":"https:\/\/www.pixel4k.com\/","sameAs":[],"@id":"#person","name":"Mika"}</script>
<link rel="dns-prefetch" href="//ajax.googleapis.com">
<link rel="dns-prefetch" href="//www.pixel4k.com">
<link rel="alternate" type="application/RSS+xml" title="4k Wallpapers,Widescreen Wallpapers » Feed" href="https://www.pixel4k.com/Feed">
<link rel="alternate" type="application/RSS+xml" title="4k Wallpapers,Widescreen Wallpapers » Comments Feed" href="https://www.pixel4k.com/comments/Feed">
<link rel="alternate" type="application/RSS+xml" title="4k Wallpapers,Widescreen Wallpapers » Search Results for “deadpool” Feed" href="https://www.pixel4k.com/search/deadpool/Feed/RSS2/">
<style type="text/css">img.wp-smiley,img.emoji{display:inline!important;border:none!important;Box-shadow:none!important;height:1em!important;width:1em!important;margin:0 .07em!important;vertical-align:-.1em!important;background:none!important;padding:0!important}</style>
<link rel="stylesheet" id="wp-block-library-css" href="https://www.pixel4k.com/wp-includes/css/dist/block-library/style.min.css?ver=5.3.8" type="text/css" media="all">
<style id="rocket-lazyload-inline-css" type="text/css">.rll-youtube-player{position:relative;padding-bottom:56.23%;height:0;overflow:hidden;max-width:100%;background:#000;margin:5px}.rll-youtube-player iframe{position:absolute;top:0;left:0;width:100%;height:100%;z-index:100;background:0 0}.rll-youtube-player img{bottom:0;display:block;left:
但此应用 (https://www.pixel4k.com/page/1?s=deadpool) 获取完整代码
我该怎么办?
解决方法
您正在获取所有数据(您的两个 url 和您的代码生成完整的 html),但是当您调用它时,android 记录器不会输出所有内容。
如果您尝试编写文件而不是日志语句,您很可能会注意到您的所有数据都可用。
参见What is the size limit for Logcat and how to change its capacity?
,我在 Java 中搜索了 String
的最大长度。根据 this question 中的 Takahiko Kawasaki,最大长度为 65536 个字符。
由于您使用的方法将网页的 HTML 代码写入 String
,这意味着如果您尝试下载的网页小于 65.536 字节,您的代码将按预期工作。
我不知道您在获取网页的 HTML 代码后需要做什么,因此以下建议可能不足以满足您的需要,但是:您是否尝试将 HTML 代码存储在 {{1} } 而不是 StringBuffer
?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。