• Drupal CMS Developer
  • Website speed optimisation
  • Drupal 6 to Drupal 8 migrations


Welcome, I am a Web Developer based in Madrid, Spain originally from the UK. I studied Computer Science & eBusiness at Loughborough University. I specialise in Content Management System websites

PHP HTML Page Parser - Design conversion script

Something I am very proud about is what's called a PHP HTML Page Parser. The PHP HTML parser was written from scratch in PHP to help me with the job of converting over 5000 web pages from an old design into a new design. This took me several months to perfect but I have finally come up with a working script that will successfully convert everything in a given folder into the new design by parsing the old HTML and putting the text content into a new template.

Webpage before conversion Webpage after conversion



Using the Script

To use this PHP HTML Parser script, you need to upload one PHP file on to the server where all of your pages are. You need to enter the folder you wish it to parse and convert (this will include all sub folders), tell it which file types to ignore, tell it where your template file is, click submit, and the page changes from "before" to "after". The one drawback with the PHP parser script is that it relies on the page having common features. For example, on the pages the parser takes all of the text in between the <h1> tag and the <hr> which always appear before and after the text.

Other uses and Functions

The main reason that I created the PHP HTML parser was simply to take the text content from the old page and put it into the new template but I created many more functions to try and reduce the amount of work I had to do.

Such functions are...

  • Copies the page title into the new template
  • Copies across all the meta information
  • Can parse and convert a whole site in one go (pages need checking and modifying slightly after)
  • It strips all tags apart from those stated (<img>,<p>,<script>, etc). a

Interested in the Script?

This script's best application is for extracting the content from a static website ready for a content management system although it can be used for an infinitive amount of situations. Due to the fact that this script is quite complex, I can not offer out or license the software. To contract the use of this script I will agree only to the following

  • All pages required for content extraction should be zipped and available for download from a URL (just HTML files)
  • For a design conversion it should include the new template
  • Once all pages have been converted/content extracted, I shall sell back the results.
  • You must understand that the system is 100% automatic and will only work 100% if you have a consistant structure to your HTML
  • You may need to make design tweeks to the content after
  • The charge for the convertion/extraction is £0.50 per page.