Scraping tags from an HTML page

Following code uses DOM to extract links.

<?php
// get contents of your html page as a string
$sText = file_get_contents('mypage.html');

// create a DOM document
$dom = new DOMDocument;

// load html into DOM. @ will parsing errors
@$dom->loadHTML($sText);

// scrape all links
$aLinkTags = $dom->getElementsByTagName('a');
$aImgTags = $dom->getElementsByTagName('img');

// put the links in an array
$aLinks = array();
foreach ($aLinkTags as $sLinkTag) {
    $aLink[$sLinkTag->nodeValue] = $link->getAttribute('href');
}
print_r($aLinks);

// put the links in an array
$aImg = array();
foreach ($aImgTags as $sImgTag) {
    $aImg[$sImgTag->nodeValue] = $link->getAttribute('href');
}
print_r($aImg);
?>

This code scrapes image and anchor links. This code can be extended to include other html tags.

See ScrapeContent.php

Technologies: 
Actions: