Omicron Llama

Coding all day, every day.

Quickly Load Sample data into SharePoint from Wikipedia

Do remember that this application is designed solely to create sample content, not to replicate the service that Wikipedia provides. The content that appears is unformatted and without images. The content not intended for public (even for internal intranet) use, so please adhere to the Wikipedia Terms of Service, and do also consider making a donation to the Wikimedia Foundation to support the freely available content that we have all taken for granted in these modern times.

Need to load up a couple of hundred pages into your SharePoint publishing site for testing search? Bored of Lorem Ipsum, or needing to test out different structures of content?

Run this code in a console application which gets a random page from Wikipedia then saves to SharePoint. There is a slight pause (approximately 1 second) as it gets each document and saves it, so this shouldn’t constitute as heavy traffic to Wikipedia. As aforementioned, the content is added unformatted, infact it adds the entire HTML document to the Page Content field (including all the ‘edit’ links, not best practice in the slightest, but all I was after was searchable content in my site).

Do note, however, that Wikipedia do offer archives of their content here: http://dumps.wikimedia.org/backup-index.html and if you’re intending on populating hundreds of thousands to millions of documents to test out large scale search solutions, then use these archives combined with the bulk load tool (available from here: http://code.msdn.microsoft.com/windowsdesktop/Load-Bulk-Content-to-3f379974). This methods requires downloading the entire archive (7GB worth) in one go then convert & upload to SharePoint directly. Use the method in this blog post to create a couple of hundred sample documents that you need to test, without downloading the entire wikipedia archive.


In SharePoint Enterprise Search, there is the option to add an external web site as a content source and to add that to your search index. I opted with the method in this post over that, because a) I would feel that I’d be adding significantly more traffic to websites that I don’t run if I indexed possibly hundreds of thousands of pages on other sites than running this code, and b) my environment doesn’t have the disk space for that kind of index :] ).


Finally, thanks to Todd Klindt and Shaun O’Callaghan for a couple of pointers and considerations for this article.


Inspiration for this method came from an answer on this StackExchange question.




Create a SharePoint Publishing Portal, and change the URL in the code to point to your new site, and simply run. You could add logic to the GetPageLayout method to retrieve a random page layout each time for testing out different information architectures, such as for testing out the Refinements web part.

One thought on “Quickly Load Sample data into SharePoint from Wikipedia

Leave a Reply

Your email address will not be published. Required fields are marked *