微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Php cURL Web Scraping

我想从网站url:http://www.flipkart.com/apple-iphone-5s/p/itmdv6f75dyxhmt4?pid=MOBDPPZZDX8WSPAT刮掉手机的价格

如果您查看代码,价格将放在以下SPAN中

<div class="pricing line">
        <div class="prices" itemprop="offers" itemscope="" itemtype="http://schema.org/Offer">
                    <div>
                        <span class="selling-price omniture-field" data-omnifield="eVar48" data-eVar48="37500">Rs. 37,500</span> // Fetch this price
                    </div>
                    <span class="sticky-message">Selling Price</span>
            <Meta itemprop="price" content="37,500"> 
            <Meta itemprop="priceCurrency" content="INR">
        </div>
</div>

我到目前为止获取代码代码是:

<?PHP
$curl = curl_init('http://www.flipkart.com/apple-iphone-5s/p/itmdv6f75dyxhmt4?pid=MOBDPPZZDX8WSPAT');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);

$page = curl_exec($curl);

if(!empty($curl)){ //if any html is actually returned

    $pokemon_doc->loadHTML($curl);
    libxml_clear_errors(); //remove errors for yucky html

    $pokemon_xpath = new DOMXPath($pokemon_doc);

    //get all the h2's with an id
    $pokemon_row = $pokemon_xpath->query('//h2[@id]');

    if($pokemon_row->length > 0){
        foreach($pokemon_row as $row){
            echo $row->nodeValue . "<br/>";
        }
    }
}

else
    print "Not found";
?>

显示错误

Fatal error: Call to a member function loadHTML() on a non-object in
D:\xampp\htdocs\jiteen\PHP-scrape\PHPScrape.PHP on line 9

我该怎么做,我无法追踪错误

解决方法:

首先,您忘记实例化DOMDocument类(至少在此问题中的代码中).

$curl = curl_init('http://www.flipkart.com/apple-iphone-5s/p/itmdv6f75dyxhmt4?pid=MOBDPPZZDX8WSPAT');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');

$page = curl_exec($curl);

if(!empty($curl)){ //if any html is actually returned

    $pokemon_doc = new DOMDocument;
    libxml_use_internal_errors(true);
    $pokemon_doc->loadHTML($page);
    libxml_clear_errors(); //remove errors for yucky html

    $pokemon_xpath = new DOMXPath($pokemon_doc);

    $price = $pokemon_xpath->evaluate('string(//div[@class="prices"]/Meta[@itemprop="price"]/@content)');
    echo $price;

    $rupees = $pokemon_xpath->evaluate('string(//div[@class="prices"]/div/span)');
    echo $rupees;
}
else {
    print "Not found";
}

Sample Output

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐