Posted on Monday 9th of April 2007 at 12:23 in Tutorials

Writing a PHP Google Sitemap generator without using fopen

The importance of the Google sitemap is a commonly discussed thing but if you don't use a pre-built solution (like Wordpress) then how do you keep your sitemap.xml file up to date? Here is a tutorial explaining how to write your own sitemap.xml generator in PHP (without using fopen).

(And yes I am fully aware the formatting of comments is rather nasty but it'll do)

The Google sitemap is a sitemap.xml file that you place in the document root of your website (which you inform Google of). It enables the popular search engine to index the pages in your site more accurately - rather than relying on the Googlebot to do all the hard work.

Automate the process
There are plugins for Wordpress that update the sitemap.xml file every time you publish (as there are for other content-managed solutions) but if you build your own site then you have to generate the file manually. Previously I've relied on giving the URL to a sitemap generator, saving it's output and uploading it to the server via FTP. I got bored of this so wrote my own generator that I could give to Google and forget about it. I'll explain how...

#1 - Set the header
Create a new file called sitemap.php. You need to set the header so that when you view the sitemap.php file it outputs as XML:



#2 - Open the connection to your database and do the query
However you store your information, it'll still be databased so the normal PHP connection/query code applies - this code is lifted directly off my site directly:



#3 - Start the XML document
Now that we've got all the current/published articles selected into a dataset we need to write the start of the XML document:



#4 - Work out the URL-Product
This is the thing that's most likely to differ depending on how you've developed your site. At Seopher.com I use URL-rewriting to associate a clean-URL with the ID of the article. Therefore what is realistically "http://www.seopher.com/viewarticle.php?id=5" becomes "http://www.seopher.com/articles/an_article_title". So the code I use to produce the URL-product is:



Whereas if you're using Wordpress-esq conventions (ie. www.seopher.com/article.php?id=5) then the code would look more like:



So all you need to do is work out how to make a real URL out of your databased content and then move on to step 5.

#5 - Output a list of your databased URLs
Now that you've worked out how to create a working URL-product of your content, it's time to output that into an XML schema that Google can make sense of.



The above code loops through the resultset and outputs the content into the XML schema that Google expects. The "lastmod" field is populated by re-formatting the timestamp you *should* have against when your article was posted. The "loc" assett is populated using the URL-product we made earlier.

#6 - Close everything down
It's just a case of closing the connection and ending the XML document.



And that's it as far as outputting everything in XML format. So you can upload the sitemap.php file to your server, load it into the browser and you should (hopefully) see a mess of all your content. View the source of the page and you should see something like:



Obviously that's my sitemap.php output (which has more than two items in it I might add) but you should see something to that effect. If you don't you'll need to troubleshoot what's causing problems. However, once you've got sitemap.php outputting something like you can see in the area above - then you can move on to step #7.

#7 - Modify the .htaccess file so sitemap.php becomes sitemap.xml
This is a crucial step because Google needs to see a SITEMAP.XML file and all you've got is SITEMAP.PHP. What you need to do is either edit or create a .htaccess file with the following logic in it:



What this does is it turns on the rewrite rule (allowing you to modify how URLs are handled, essentially) and adds the logic that allows a file.php to be intepreted as file.xml.

This now means that if you put sitemap.xml into your browser you'll be viewing the output from sitemap.php and that's crucial because now when Google looks for sitemap.xml it's viewing live data from your PHP script. This means that your sitemap.xml file will never be inaccurate.

Conclusion of what you should have
A sitemap.php file on your server that you can access by entering "www.yourwebsite.com/sitemap.php" or "www.yourwebsite.com/sitemap.xml" into your browser. This means that you now have a constantly up-to-date sitemap.xml file because you're not having to get it generated by a third party and upload it to your server.

How to improve it
My sitemap generator doesn't index my static pages (or even the homepage) because the homepage is already indexed sufficiently and I consider the other pages (contact, about etc) to be of no use to search engines. They're easily accessible from the navigation too so Googlebot shouldn't have any problems indexing them anyway.

Why it's good
Most hosts disable the use of the PHP function fopen which you need to write a physical sitemap.xml file, so this method bipasses the physical creation and instead references the PHP file as an XML document.

Hope this was useful.

 

Enjoy this article? Why not subscribe to the full RSS feed?

Add Your Comments








Comments

Showing most recent 6 of 6 comments

Is this the natural thing? But whatever it is... this is a pretty nasty solution.. Got the idea, and surely I’ll implement this in my website :D thanks!
This is what was looking for, I am using Joomla CMS for me few sites. I would love to have all stuff indexed in google with auto updation, like mentioned in this article.

There may be several hundred links in my XML sitemap file, if not thousands. I am afraid, if google access my sitemap.php to get the xml code, it may result in timeout stuff.

May be for bigger sites, it would be better to create a physical sitemap.xml file with sitemap.php.

Anyway, will work it out as mentioned.
This worked exactly like I needed thanks for the code!
Yeah I developed a similar method using fopen but found it be needing a great crawler to go along to generator sitemaps. Try using this one and taking a look

http://www.hawkenterprises.org/2007/09/29/php-sitemap-generator-site-map-script.html
This is so useful it’s unreal, thanks!
Excellent article. Did a few changes to code as my db writes the date as a unix timestamp so had to place a couple of more lines in there.

Excellent..

Subscribe to the RSS Feed

Stay up to date with Seopher.com by subscribing to the RSS feed, either in your browser or subscribe via email using the form below

Updates by Email

By subscribing by email you’re also subscribing to the Seopher.com newsletter; a periodical email outlining new reviews, competitions and other subscriber-only content

  • 125x125 banner only $50 pcm
  • Dreamhost Hosting $5.95 per month
Want to give your product/website exposure?

Paying for a featured review is a great way to give your product, service or website exposure. For as little as $50 you can have a full review on the site forever.

Advertising Bundle! Review + Banner = $70

To kick start the new improved Seopher.com, buy a review and get a 125x125 advert half price. Your banner gets displayed on over 450 pages for a full month.