TECHNICAL SEO FOR LARGE WEBSITES

Matthew Carter
Hello friends, my name is Matthew Carter. I’m a professional link builder for a large SEO agency in New York City.

TRY OUR PACKAGES

Small Package
$49
Human Articles with Readable Spin
TIER 1
15 High DA web 2.0 Properties
10 High DA Trusted Profiles
20 High DA Bookmarks
5 EDU Profiles
50 Powerful Web 2.0 Profiles
10 Blog Platform Articles
10 High DA Documents Links
10 Manual Image Submission Links
10 Niche Related Blog Comments
1 Weebly Post
1 Tumblr Post
1 Wordpress Post
1 Blogspot Post
1 Medium Post
50 Facebook Reshares
50 Twitter Retweets
100 Pinterest Repins
TIER 2
Blog Comments LinkJuice
Bookmarks LinkJuice
Instant Link Indexer Services
Drip Feed Pinging
Order Now
Medium Package
$99
80 Unique Human Articles no spin
TIER 1
30 High DA web 2.0 Properties
25 High DA Trusted Profiles
30 High DA Bookmarks
7 EDU Profiles
70 Web 2.0 Media Profiles
25 Blog Platform Articles
15 High DA Documents Links
12 Image Sharing Backlinks
20 Niche Related Blog Comments
10 High DA Forum Profiles
10 Press Releases
Video Creation
10 Video Submissions
Power Point Creation
10 Power Point Submissions
1 EDU Blog Post
1 Weebly Post
5 High PA Tumblr Posts
1 Wordpress Post
1 Blogspot Post
1 Medium Post
1 Mix.com Share
1 Flickr Share
1 Myspace Share
100 Facebook Reshares
100 Twitter Retweets
250 Pinterest Repins
TIER 2
Blog Comments LinkJuice
Bookmarks LinkJuice
Article Submission
Guestbook Comments
Social Network Profiles
Static Links
Referrer Links
Instant Link Indexer Services
Drip Feed Pinging
Order Now
Big Package
$159
140 Unique Human Articles no spin
TIER 1
50 High DA web 2.0 Properties
40 High DA Trusted Profiles
40 High DA Bookmarks
10 EDU Profiles
100 Web 2.0 Media Profiles
50 Blog Platform Articles
20 High DA Documents Links
15 Image Sharing Backlinks
30 Niche Related Blog Comments
20 High DA Forum Profiles
20 Press Releases
Video Creation
20 Video Submissions
Power Point Creation
20 Power Point Submissions
1 EDU Blog Post
1 Weebly Post
10 High PA Tumblr Posts
1 Wordpress Post
1 Blogspot Post
1 Medium Post
1 Mix.com Share
1 Flickr Share
1 Myspace Share
1 Penzu Post
1 Ex.co Post
1 Behance Post
1 Voog Post
1 Linkedin Post
1 EzineArticle Post
250 Facebook Reshares
300 Twitter Retweets
500 Pinterest Repins
TIER 2
Blog Comments LinkJuice
Bookmarks LinkJuice
Article Submission
Guestbook Comments
Social Network Profiles
Static Links
Referrer Links
Instant Link Indexer Services
Drip Feed Pinging
Order Now

PORTFOLIO













You can get a first important glance into Googlebots crawling behaviour through the “crawling-statistics” report in the Google Search Console. It will give you information on how many pages of the domain have been crawled per day. If, for example, you have 1.000.000 URLs on your domain and Google crawls a mere 20.000 pages a day, then it is quite possible that it will be a long time until each of your pages has been seen anew by Googlebot.

Getting rid of large, unnecessary parts of a website can lead to an improvement of the average signals for the rest of the site. Best case scenario, your large scale deindexing measure will only leave behind pages that are an essential part of your site and of above-than-average quality .

Let’s just ask them ourselves! We consulted top SEOs from the UK, Germany and Spain and asked them specific questions on this topic.

While our experts agree that having a strong brand will not influence your regular keyword research, they do add that evaluating the brand-searches can be quite useful.

Skillsets become more and more niche and SEO work becomes more specialised – getting an SEO consultant can be helpful.

How do I deal with steering and monitoring Googlebot on extensive websites?

One of the most common obstacles is a missing technical capacity to implement the changes, at all. If you did not manage to secure the necessary budget and resources from your IT department or external contractor, you will quickly have the problem that your well laid plans are not implemented in full, or, worst-case, not at all.

In order to unveil keyword cannibalization you will need tools that can show you which keywords have multiple pages ranking or where different pages take the place of each other within a short time. For these results you will need to set up a process of editorial decisions where only one of the pages is chosen to be the best representation for that keyword.

At the same time, it is important to set up automated processes, such as unit-tests, in order to make sure that, before each release, everything is working as expected. If you are working in an Agile development process, setting up SEO-Checkpoints has proven to be a very effective method of getting ahead of possible errors.

SEO as an industry is constantly changing. In the future, we can expect that “getting things done” will get harder. You already have to know a wide array of technical topics that border on SEO and, in the future, this will only increase.

When you are hiring a consultant for a short term project and still retain an agency, long-term, it may become a challenge to balance their goals. Make sure that the long-term goals of your organization are always front and center.

Both of these implementations will put you in a position where you can automatically supply a large amount of pages with internal links. When it comes to deciding which method you want to use it is important to have clear strategic goals of what you want to achieve with each method and with your navigation, in general.

Set up meetings at the start of large projects for everyone involved and get in touch with other departments who also work on the same project.

Additionally, it is a good idea to keep an eye on the click levels that Googlebot has to go through on your site, in order to get to a specific piece of content. The deeper a page is hidden in the navigation or the internal linkgraph, the harder it becomes for Googlebot to find it , in any reasonable amount of time.

Two additional, helpful strategies are using internal links similar to how Wikipedia does, as well as to set up an automated system that will add links to similar and related pages, for example, as part of an article’s sidebar.

Which strategies should I implement when it comes to the crawling and indexing of large websites?

Using Google-hacks, such as implementing both “noindex” and “follow” on pagination pages, for example, can be an effective band-aid for a while. Problems can arise though if Google decides to change how their crawl and process these signals.

Usually companies will not have inhouse specialists for every topic. Big companies must rely on external resources for campaigns or times when it gets busy. If your company does not need a ressource all the time, it makes sense to get an external consultants on board when necessary.

Large and extensive websites have their very own kind of requirements. In order to implement good SEO for large websites, a company needs a team of technical and editorial staff and, on top of that, it is very helpful to have a project manager who ensures the communication between everyone involved.

If hyperdynamic content from your website managed to find a way into the index, you can confidently deindex it . The same goes for parameters and paginations. Both of these need to be handled in a clean technical way by your system. Please also keep in mind that the “noindex” attribute will only affect indexing, not the crawling of your site.

The Internet has many advantages. On of them is the possibility for a piece of content to stand the test of time and satisfy visitors for years and years. Unlike daily- or weekly newspapers, you do not have to constantly create similar pieces of content just to fill up the paper.

If you do not yet have a system in place for your analyses, it may be a good idea to start with an ELK-stack. This is a combination between Elastic Search, Logstash and Kibana, which can help you to collect, save and evaluate your log-files.

SEO in large organizations – who needs to know what?

Getting rid of a large number of pages can also have a positive effect on your navigation and internal linking strategy, by simplifying both, which can then also help Googlebot crawl your resources more thoroughly and find more of the pages you really want Google to see.

Using XML-, picture-, video- and product-sitemaps can help Googlebot better understand extensive websites. Have an algorithmic process of creating these files and do not create them by hand for large and extensive websites.

Behind every extensive website there is a large treasure trove of know-how. At SISTRIX we always have the goal of sharing first-hand SEO-insider-knowledge. Four UK SEO experts with extensive experience in working on large websites will exclusively share their insights with you.

When it comes to SEO for extensive websites, Kirsty Hulse, Patrick Langridge, Alex Moss and Stephen Kenwright can draw from their extensive experience of working with large projects and know-how to implement the necessary processes in or for large organizations.

Once you start questioning whether specific pages or even whole parts of your website might have no business being in the index you are metaphorically at the point where you are trying to shut the stable door after the horse has bolted. Editorial articles and content, for example, might be ripe for an update instead of a being deindexed.

Luckily, you are not limited to your own website when looking for useful content. Run a content-gap-analysis against your competiton and discover which content of theirs is evergreen.

An important step on the road to preventing cannibalization is a strict and logical structure within the information architecture of the website. Through it, there should be a clear process of when new pages should be created, which ones should be created and where within the structure of the site they should be located? It is important to prevent the haphazard creation of similar pieces of content by different teams, that are then strewn randomly across your plattform.

This occurs naturally when a product or service fits into more than one category, but if there’s no canonical (primary URL) set then the search engines will see multiple duplicate pages and be unsure which page to include in their index.

Big website = big chance of duplicate content issues.

It is also recommended that each page within the paginated series specifies a self referencing canonical URL element.

Another common occurrence I discover when auditing large scale sites is the existence of “soft 404 errors”. These are essentially pages on the site which no longer exist, yet don’t return the correct 404 header status code.

“Crawl demand is how much Google wants to crawl your pages. This is based on how popular your pages are and how stale the content is in the Google index.”

Lack of Canonicalisation.

Further crawl waste can be minimised by properly managing your internal site search results pages. Googlebot may try throwing random keywords into your site search – meaning you’ll end up with random indexed pages.

This issue can also occur when large websites use a faceted navigation to allow users to locate products. The image below is taken from a product category page on our client, Hidepark Leather’s website, and as you can see, there are many ways in which users can sort the products within the category, including multiple permutations and therefore the possibility for thousands of unique URLs to be generated. Depending on the scale of the site and the ways in which products can be sorted and viewed, failure to handle faceted navigation can lead to duplication issues on an enormous scale.

Ensuring that your website is accessible has always been best practice, but over recent years, page load speed and site stability have become core considerations for Google when considering the quality of a website.

First of all, the scale of these sites means that the existence of fundamental technical errors are likely to multiply many times over, increasing the total number of issues a search engine crawler will detect. These issues, over time, may potentially downgrade the overall quality of the site and lead to indexing and visibility issues.

The nature of the content duplication typically falls into two core categories;

Ensuring your 404 page is a helpful and engaging will help direct users back to your sites valuable pages effectively.

Finally, although there’s no substitute for experience, it’s vital to continuously refer to Google’s own Webmaster Guidelines to ensure and sense check any proposed fixes.

The first major issue which I’ve encountered is that some sites have no canonical tags in place at all.

Generally speaking, my own approach to the above issues, although maybe be considered slightly hardline, is to block sorting parameters in the robots.txt by identifying all of the patterns and parameters within the sorted URLs.

In order to identify and analyse the issues, you’ll need access to a few things, the first one being Google Search Console. There are various sections within GSC which will become your best friend when analysing technical SEO performance, namely the index and crawl areas of the interface.

Hard 404 errors.

According to Google, “this is a problem because search engines might spend much of their time crawling and indexing non-existent, often duplicate URLs on your site. As a result, the unique URLs which you are wishing to be discovered may not be crawled as frequently as your crawl coverage is limited due to the time Googlebot will spend crawling non-existent URLs.”

Secondly, huge websites can present challenges for search engine crawlers as they look to understand the site structure, which pages to crawl and for how long to spend crawling the site .

If this is happening, then each URL will be treated as unique URL – throw in subdomains and http protocols which are incorrectly configured (www vs non-www & http vs https) and 1 URL can lead to 5 or 6 duplicates in existence.

To avoid duplicate content issues of this nature;

Use Google Analytics to establish if any of the listed URLs receive valuable traffic, and also use a backlink analysis tool such as ahrefs to check if there are any important backlink pointing to the broken URLs – you’ll then be better informed with regards to applying any 301 redirects converse the traffic and link equity passed into to your domain from these URLs.

A canonical tag (aka “rel canonical”) is a method of notifying search engine crawlers that a specific URL represents the master copy of a page, and is helpful where there may be confusion for search engines caused by duplicate or similar URLs.

The above issues are by no means an exhaustive list. I could produce a tonne of content around the weird and wonderful things I’ve witnessed and had to try and resolve when conducting technical SEO audits and indeed some of the issues above could be expanded upon and covered in greater depth (stay tuned).

Paginated content.

Here at Impression, our tool of choice is DeepCrawl.

Pagination is common on large scale sites and occurs when content spans over numbers pages as part of categorised series.

More often than not, these pages are 302 (temporarily) redirected to a final location URL which then eventually returns a 200 OK status code.

For optimising crawl efficiency, where possible, I always recommend implementing Rel=”Prev”/”next” to indicate the relationship between component URLs.

Google recommends that you configure your server to always return either a 404 (not found) or a 410 (gone) response code in response to a request for a non-existent page.

There are multiple options for handling duplicate content issues of this nature, covered in depth in this post over at Moz, but to summarise, the most common are (and sometimes a combination of);

A common misconception is that simply by having 404 errors exist, your sites rankings may suffer. Problems can arise, however, when valuable pages are moved to a new URLs and not redirected correctly using 301 redirects.

Large-scale websites present challenges for both webmasters and SEOs for a number of reasons;

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.