PHP code to Scrape tags from an HTML page

#dom, #PHP, #xml

Following code uses DOM to extract links.

// get contents of your html page as a string
$sText = file_get_contents('mypage.html');

// create a DOM document
$dom = new DOMDocument;

// load html into DOM. @ will parsing errors
@$dom->loadHTML($sText);

// scrape all links
$aLinkTags = $dom->getElementsByTagName('a');
$aImgTags = $dom->getElementsByTagName('img');

// put the links in an array
$aLinks = array();
foreach ($aLinkTags as $sLinkTag) {
    $aLink[$sLinkTag->nodeValue] = $link->getAttribute('href');
}
print_r($aLinks);

// put the links in an array
$aImg = array();
foreach ($aImgTags as $sImgTag) {
    $aImg[$sImgTag->nodeValue] = $link->getAttribute('href');
}
print_r($aImg);

This code scrapes image and anchor links. This code can be extended to include other html tags.

PHP code to Scrape tags from an HTML page

Related Post

Everything you need to know about PHP Extension Community Library (PECL)

How to create CSV output from PHP code

How to convert XML to JSON using PHP code

You missed

Oracle SQL Error Cheat Sheet: Common Errors and Fixes

JSON, XML, and YAML for Scientists: Data Formats Explained Simply

CRISPR Under the Microscope: Understanding the Risks, Ethics, and Regulation of Gene Editing

Azure vs AWS Certifications in Canada: A Complete Guide for 2025