21 02 2013

SharePoint 2010 Server Search Only Crawling Top Level Site

Jimmywim | headers, search, sharepoint | 2

This one cost me three evenings, and turned out to be a silly little thing, hope it can help someone else.

The scenario:

One web application has one site collection. Only pages in the top level site are being indexed by the crawler. These pages happen to be ones linked by the homepage. Another web application with a site collection and site hierarchy is able to be fully crawled. When I added a static link to a sub site, that sub site’s homepage (and other pages it linked to) were being crawled OK, but no other sub sites. No SharePoint list items or documents in document libraries were being indexed in any site (even the top level site).

This was a UAT environment, and our live environment, which appeared to be configured identically, did not have this issue – the entire site collection on live was able to be crawled without issue.

Here’s what I checked:

Web App URL was set in the Content Source
Content Source was set as SharePoint Sites
Content Source was set to crawl entire web application
Crawler Account had Full Read on User Policy for the web app
Same behaviour when setting the Crawler Account to a site collection admin account
Same behaviour when crawling any AAM URL for the web app
Was able to log in as any set crawler account and browse the full site
There were no errors in the Crawl Log, only the 6 successes
There were no errors reported in ULS, even when bumped to Verbose
When using Fiddler as a reserve proxy, there were no errors reporting as the crawler happily crawled the 6 pages it could see
Deleting and re-creating the Search Service Application from scratch had no effect
Removing all Scopes and Crawl Rules had no effect

What was the problem?

Missing ‘MicrosoftSharePointTeamServices’ header in the web application configuration in IIS, as I eventually found in a hint from this MSDN forum post (they are useful sometimes!):

http://social.technet.microsoft.com/forums/en-AU/sharepointadminprevious/thread/6fa317a5-9d2d-41a7-90f4-118050cf82dc

It turns out this header is required by the SharePoint crawler to ensure that the crawled site is indeed a SharePoint site and it can use the standard SharePoint APIs to discover the site’s content. If it can’t see this, the crawler treats the site as a standard static HTML web site and follows links to discover content. That is why I could only see a handful of pages that were linked from the homepage in the crawl log.

I was able to prove this first by checking the response headers from hitting both the UAT and live environments and checking the headers – when I noticed the the MicrosoftSharePointTeamServices response header missing from the UAT environment, I knew where to start looking.

If you come across this issue, beware that there are several places in an ASP.NET web site where you can control/alter the HTTP Response headers – you can do this in a custom Control (ASCX or CLR type style), in a HTTP Module or, as was the case here, in the application configuration (done in IIS, but can also be done in the web.config). Do also ensure when you manually recreate this you set the version number to the correct version of your SharePoint farm. Subsequent updates to your environment should keep this value up to date. (Actually, I’m unsure if the PSConfig wizard will update this header if it has been removed from a web app…).

Why was this removed?

Security (I guess, someone else did it but has since left the company).

Here’s a link to a blog post as to why it’s good security practice to remove these headers on public facing websites: http://www.marc-lognoul.me/itblog-en/post/2012/12/05/SharePoint-Removing-HTTP-Headers-for-Security-Reasons.aspx

But do beware, if you choose to follow these practices, ensure you have a web application available which could just be internally accessable but lets search work properly!

ASP.NET Modules – reading InputStream with StreamReader Unable to load first class Dao provider type – PerformancePoint Services error code 10201

2 thoughts on “SharePoint 2010 Server Search Only Crawling Top Level Site”

Adel Refaat says:

July 8, 2013 at 7:09 am

Today, I faced the exact same issue, with only one success in the crawl log.
In SharePoint 2013 the mentioned Http Response Header is
MicrosoftSharePointTeamServices
15.0.0.4505

Thank you James, you saved me hours of troubleshooting

Reply
Mike Nguyen says:

June 17, 2015 at 10:09 am

This’s exact what I need. Thank you very much.
You can check SP build version by PowerShell cmllet (get-spfarm).BuildVersion

Reply

SharePoint 2010 Server Search Only Crawling Top Level Site

2 thoughts on “SharePoint 2010 Server Search Only Crawling Top Level Site”

Leave a Reply Cancel reply