RogerWheatley.com Media Rich Web

Search Engine Marketing or "How the New Google Patent May Effect You".

In March of 2005 Google obtained their patent "Information retrieval based on historical data" as "...A system identifies a document and obtains one or more types of history data associated with the document. The system may generate a score for the document based, at least in part, on the one or more types of history data..." (A copy of that patent can be read here - Warning, it is a large document, about 95kb)

Request a free project estimateUpon reading the patent, several issues come to bear on current and new web sites. This article deals with issues web site owners need to be aware of in order to help develop a quality web site that is search engine or "Google Friendly".

While examining the Google patent, it is important to note that some of the ideas, concepts, etc. have already been incorporated into the current search algorithm. Some are currently pending for inclusion and surely others may yet to be included (if ever). The point being that it appears Google hopes to ensure that many ranking techniques remain under it's control. This is a prudent business move in my opinion. Google has not indicated which of the ideas or concepts are currently included in the algorithm. One thing that was interesting to note is that new links or sites are stored for about a month before they are evaluated. (In the web development industry this is referred to as the "Google Sandbox".)

In addition to emphasis being placed on "Information retrieval based on historical data" it is also rather apparent that the prime method of sorting and cataloging search results is via the evaluation of "Organic Linking". In a nutshell, organic linking refers the garnering of topic relevant inbound links to the appropriate areas of your web site. Albeit a slower method of building inbound links, nevertheless, it is a more powerful way to help boost the quality of your web site. (Or at least that's how it appears in the patent description.) Why is this important? Well, Google needs to remain competitive (by returning valid search results), therefore it is important to motivate web site owners to maintain a proactive stance against the "darker" methods of attempted search engine optimization (sometimes referred to as link spamming.) As the Internet continues to grow and become ever more complex, operations such as Google cannot realistically afford to pander to link spamming technique which would serve to dilute the validity of search results returned. Engage in the "darker" methods of optimization and you may find your web site sandboxed - permanently!

To facilitate this, Google must gather data when it examines a page and correlate this with the historical data about the page. There are various sections within the patent that pertain to this. Most notable are:

  • Which computer or who is clicking links.
  • Where the link clicked is coming from.
  • How old the page is.
  • When the page was first seen.
  • How often a link to the page is clicked.
  • If the web page is a duplicate or not.

One area dealt with is important to new web site owners as it deals with the growth of a web site. After all, how much historical information can there be for a brand new web site? Earlier I referred to the "darker" web optimization techniques (which appear to no longer bode well for those using them). An example, you have developed a new web site and are working to garner inbound links (with the understanding that an inbound link is a "vote of confidence" so to speak). In doing so, you gather about 1000 inbound links over the period of a month. Notwithstanding the "Organic Linking" issue, there's a caveat here...

If a web site has a higher (than average) growth rate for inbound links based on specific keywords and there is not a corresponding growth in search keyword volume (for that keyword), then it appears the growth may be assumed to be spam-linking. You risk being sandboxed. On the other hand let's say the sudden growth is due to a current fad or trend and you sell a product or service that caters to this trend - In all likelihood you are not sandboxed as links could be assumed to be legitimate (during the trend). Why? During a trend/fad there would be a corresponding sudden growth in search keyword volume.

The patent implications to web site owners can perhaps be broken down into six sections:

Documents

Do not change the focus of several documents on your web site at the same time. Why?

Document changes are recorded for the degree (amount of) change and how often (time wise). Changes in the weighting of document keywords as well as documents that link (to the document being changed) are also recorded.

However, it is important to "freshen" the contents of some documents so that they remain current. Why?

Documents are assigned a "staleness" score. The more stale the document, the less likely for it to appear in search results.

Be careful where your documents link to externally:

Google may monitor one or a combination of the following factors:

  1. The extent to and rate at which advertisements are presented or updated by a given document over time.

  2. The quality of the advertisers; a document whose advertisements refer/link to documents known to search engines over time to have relatively high traffic and trust may be given relatively more weight than those documents whose advertisements refer to low traffic/untrustworthy documents.

  3. The extent to which the advertisements generate user traffic to the documents to which they relate.

[Return]

Domain Information

If you are serious about promoting your business (or personal) web site (and its domain name), it appears to be in your best interests to invest in a longer term for your domain name registration. Here is what Google included in the patent:

"Domains can be renewed up to a period of 10 years. Valuable (legitimate) domains are often paid for several years in advance, while doorway (illegitimate) domains rarely are used for more than a year. Therefore, the date when a domain expires in the future can be used as a factor in predicting the legitimacy of a domain and, thus, the documents associated therewith."

Additionally, in order to help promote your web site within Google, it's advisable that your web site contain several pages of quality content. The patent indicates that domains should contain more than one web page. Create quality documents related to your services or product. In addition to describing them, it is important (for marketing) to list the benefits of a product or service.

Remember to keep your domain topic subject in focus. If you have a new business opportunity to launch, it might help to include that in a separate domain if you are going to add a large volume of pages related to the new topic. If not, the anchor text might be ignored. Here's the input from the Google patent:

"...determining a date when the text of a document changes significantly or when the text of the anchor text changes significantly. All links and/or anchor text prior to that date may then be ignored or discounted."

An interesting development is to capitalize on which server your hosting account (and your domain's web pages) reside. Here's what I mean by that:

Make sure the nameserver that hosts your domain record (domain name --> IP address) has a mix of different domains from different registrars. It appears Google considers this "Good". If you need to learn more about the function of a nameserver (DNS server) and its importance, simply send your question off here . One statement of Google's treatment of nameserver information is:

"...the age, or other information, regarding a name server associated with a domain may be used to predict the legitimacy of the domain. A "good" name server may have a mix of different domains from different registrars and have a history of hosting those domains, while a "bad" name server might host mainly pornography or doorway domains, domains with commercial words (a common indicator of spam), or primarily bulk domains from a single registrar, or might be brand new. The newness of a name server might not automatically be a negative factor in determining the legitimacy of the associated domain, but in combination with other factors, such as ones described herein, it could be."

[Return]

User Information

Various areas of the patent refer to user information. Ensure the development of your web content capitalizes on some of the following patent issues.

The time visitors spend on your web site may be an indicator of the quality of web page "freshness" or "staleness".

"If a document is returned for a certain query and over time, or within a given time window, users spend either more or less time on average on the document given the same or similar query, then this may be used as an indication that the document is fresh or stale, respectively."

Try to write content that may motivate a visitor to bookmark the page. It seems there are a couple of sections which indicate that additions (or removals) of bookmarked pages may be monitored. Obviously, visitors who book mark a page, intend to return it.

Visitor traffic to your web pages may be recorded and monitored for changes in traffic patterns. (Remember that all browsers maintain a cache and history files) as well as behaviour within your web site (anchor text click through's, back button usage, etc.)

[Return]

Search Results and Click Rates

Do your best to ensure you have high quality content with relative links available. The more clicks may mean the greater the value of the web page being "clicked to".

Also the issue of freshness and staleness of a web page is also monitored in the context of preference. That is if a visitor clicks on a stale or fresh web page link in the results of their search query.

Click trends (covered throughout the patent) are also monitored for increases or decreases. However seasonality (and fads) is also taken into consideration.

The web page rankings are recorded and changes in them are monitored (more comments about this in the "Generalized Issues" below).

[Return]

Link Information

For such a system to function (because it is related to historical data), keep in mind that link anchor text is recorded along with the date of its discovery. It is important to set your links and have them remain static and linking to quality relevant documents. These links, over time, gain greater value than links that change often.

When developing your content, keep the following in mind:

New web sites are not expected to have a large number of links. (This takes time to grow - Remember to use "Organic Linking")

Inbound links are worth more that outbound links.

"...it may be assumed that a document with a fairly recent inception date will not have a significant number of links from other documents (i.e., back links). For existing link-based scoring techniques that score based on the number of links to/from a document, this recent document may be scored lower than an older document that has a larger number of links (e.g., back links). When the inception date of the documents are considered, however, the scores of the documents may be modified (either positively or negatively) based on the documents' inception dates."

Make sure your web site does not have a sudden growth of inbound links as this could be considered an indicator of spam-linking/marketing. Do not use link farms, referrer logs, guest books and the like.

"The dates that links appear can also be used to detect "spam," where owners of documents or their colleagues create links to their own document for the purpose of boosting the score assigned by a search engine. A typical, "legitimate" document attracts back links slowly. A large spike in the quantity of back links may signal a topical phenomenon (e.g., the CDC web site may develop many links quickly after an outbreak, such as SARS), or signal attempts to spam a search engine (to obtain a higher ranking and, thus, better placement in search results) by exchanging links, purchasing links, or gaining links from documents without editorial discretion on making links. Examples of documents that give links without editorial discretion include guest books, referrer logs, and "free for all" pages that let anyone add a link to a document."

One concept of links developing over time is the idea that stale documents which continue to gain new links, will therefore be considered fresh documents. Again, this is why it is important for web site owners to develop quality content and use organic linking.

"...For example, assume that a document S is 2 years olds. Document S may be considered fresh if n % of the links to S are fresh or if the documents containing forward links to S are considered fresh. The latter can be checked by using the creation date of the document and applying this technique recursively. "

Needless to say the analysis of this topic within the patent could go further. It is apparent that common sense must prevail for web site owners. Obviously a new web site is not expected to have full content and a fully developed links. Concentrate on developing your web site over time, gaining links over time, etc. Gone are the days when a web site is not launched until all text, keywords, etc. are complete, add the pages as they are developed, that is expected. Also remember, when your web site is first "discovered" it is sandboxed for about a month until evaluation.

[Return]

Generalized Issues

Your web page rankings are recorded and changes to them are monitored. If those changes are frequent then you risk the web page loosing its trustworthiness! In other words, stop playing around with the page, just to try to boost its ranking - You would be wasting your valuable time! (Your focus should be on developing quality content NOT page ranking!)

"According to an implementation consistent with the principles of the invention, information relating to prior rankings of a document may be used to generate (or alter) a score associated with the document. For example, search engine 125 may monitor the time-varying ranking of a document in response to search queries provided to search engine 125. Search engine 125 may determine that a document that jumps in rankings across many queries might be a topical document or it could signal an attempt to spam search engine 125."

Try not to change your keywords often. If keywords change often, then your domain should change often too. A blog or portal type domain is a good example. A static site without user input of articles, comments, etc. not.

Again, there are numerous issues and I have attempted to pinpoint the most pertinent point that may effect each of us; and that we can address in the development of your own domains. It appears apparent that the changes within the Google patent reach further than a simple search tool to help ensure the relevancy and freshness of search results. In conclusion, my own impression and advice is to build relevant links including organic linking, keep all your content focused to the subject of your web site with appropriate navigation and keywords/phrases for search spiders (Google) to crawl. It sounds simple, but it is a great challenge. Enjoy it!

[Return]

Roger Wheatley often provides public seminars and presentations for businesses and organizations. If you would like to book a seminar for your business or organization, or are interested in learning more, please contact us.

 
   
   
Copyright. ©1999-2006 Roger Wheatley. All rights Reserved.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License.
Google