Following code uses DOM to extract links.
// get contents of your html page as a string
$sText = file_get_contents('mypage.html');
// create a DOM document
$dom = new DOMDocument;
// load html into DOM. @ will parsing errors
@$dom->loadHTML($sText);
// scrape all links
$aLinkTags = $dom->getElementsByTagName('a');
$aImgTags = $dom->getElementsByTagName('img');
// put the links in an array
$aLinks = array();
foreach ($aLinkTags as $sLinkTag) {
$aLink[$sLinkTag->nodeValue] = $link->getAttribute('href');
}
print_r($aLinks);
// put the links in an array
$aImg = array();
foreach ($aImgTags as $sImgTag) {
$aImg[$sImgTag->nodeValue] = $link->getAttribute('href');
}
print_r($aImg);
This code scrapes image and anchor links. This code can be extended to include other html tags.