Omicron Llama

Coding all day, every day.

Text-only Description field in Google News RSS Feeds in XSLT

One annoying thing about Google’s awesome RSS feed from its news site is the Description field.

If you look at the XML generated by http://news.google.com/news?q=sharepoint&output=rss for example, you’ll see the Description field is flooded with lots of crap, un-needed, encoded HTML, instead of the pure description in plain text as you’d expect (and probably what is defined in the schema for RSS).

What if you want to scrap all this crap from the Description field, leaving behind what should be in the text? Simple – take a bog-standard “strip HTML” XSLT template (can be found anywhere online after a quick Google/Bing), and pass it to a trimmed-down version of the description field.

Like this. Here’s the “strip-HTML” template:

And this is the normal way to call it:

So what we do is look for tags which are always present around the text we’re interest in and strip them out. It might help if you open the feed in Internet Explorer so it renders the encoded HTML, and then copy & paste the contents of a “description” tag into something that can format & tabulate the text (like Visual Studio). (Here’s the link for the RSS feed again: http://news.google.com/news?q=sharepoint&output=rss)

We can see that before the “real” description text there are the tags:

and toward the end there is

So we just use a bit of string manipulation in the XSLT to make sure we pass in anything in between these two “tokens”, like so:

3 thoughts on “Text-only Description field in Google News RSS Feeds in XSLT

  • Shlomi Levi says:

    Hi James,

    Can you please give the source code for download?

  • Ayman says:

    Hi James,

    I’m trying to parse google news rss and I faced the same issue of “description”.
    I found this blog is the only one which gives simple solution, However I can’t use it because I don’t know how to apply the xslt to the description text using java.
    Could you please update the article with simple implementation?

    Thanks..

    • Jimmywim says:

      I’m not a Java developer so I don’t know what resources to look for to do this. However, if you search the internet and various Java developer forums you may find methods of transforming XML using XSLT with the JDK. The same XSLT should work though.

Leave a Reply

Your email address will not be published. Required fields are marked *