Introduction and Guide to Latent Semantic Indexing (LSI) with SEO – Here’s how it might work.
Over the years, there’s been a lot of speculation about what Latent Semantic Indexing (or LSI as it’s more commonly known) actually is as a system used with SEO, how it’s used and more to the point, how a search engine like Google might be using it as a way to help assign meaning or a “relevance” score to a given web page against a given search keyword or term.
There are no clear cut definitions of how it works, because there are derivations of the original scientific subject matter and along with this, there are “opinions” of what a given search engines implementation may actually look like deep inside its algorithm. These, coupled with a nothing short of “astonishing” lack of understanding by the so called “experts” have given rise to a plethora of optimization tools, guides and documents each describing a different thing – or in fact – nothing at all – never mind the right thing.
An introductory beginners summary of Latent Semantic Indexing reads as follows :-
“A method of identifying words and terms which are semantically or topically related to a given word or searched for string of words to aid in document retrieval.”
And that’s it in a nutshell. Of course we can go on to say how this is achieved and then speculate on how it might be of use as a strategy for improving the quality of our SEO to help your business marketing and get more search traffic – however speculation is all it is based on unless of course you work as a software engineer for a search engine which has implemented the LSI relationship concept as a page ranking factor.
Step 1 – Identification and Better Analysis of Semantically Related Words and Terms.
So on we go. Let us start by taking an example of a search term or phrase which we can use for the purpose of this document and query semantically related words. I am going to use more than a single word – I am going to use a persons name. “Albert Einstein”. There we have it. Everyone knows the name, and better still, most people know what Albert is famous for.
As a search engine – prior to indexing a new page, I may want to try and find and perform a basic analysis on words and short phrases which are associated with “Albert Einstein” (because as the engine, I have already worked out by means only guessed at, that the page “wants” to rank for the scientists name) – and luckily for me – I have billions upon billions of documents describing everything from how to make a cake to how to better understand LSI. Better yet – I have a machine powerful enough to learn about and process these unstructured documents.
The first step I take will be to look for all articles including the term I am searching for with my query. This will be known as my “document set”. Imagine if you will – I have just run a computer program which has collected around 46,000,000 web pages together, all of which contain “Albert Einstein”.
The next step is to extract every single word from every single one of these documents and build a simple matrix which contains each word found, and a count of the number of documents from within the set which uses that word. Our matrix might look like this :-
and so on….. but you get the picture – relativity will be found in a high percentage of the pages which contain a match with our search term. In the example, you can see that all but a million of the 46,000,000 pages contain the word relativity, this is because, by the entire planet (don’t forget – everyone puts pages up these days) our famous scientist is often written about in conjunction with his theory. His theory is now semantically related to his name by definition of the entire World Wide Web – he is heavily bound to it – the name is therefore synonymous with the term – whilst they are not actually true synonyms! We must consider that since it is semantics with which we are learning here.
The other words you see in the list are also related to Albert but not as strongly as those at the top, and the list goes on until eventually we reach the stage where there are no words which appear in 2 or more documents. We now have a table which contains exactly every word everyone ever published online alongside his name (more than once).
At this point it is important to know how accurate this methodology is in determining relevance. Consider for a moment the word “tarmac”. I haven’t checked, but I very much doubt there are many documents out there which reference the word “tarmac” alongside our famous scientist. By its very nature and such enormous data sets – we can see how LSI will collate the most relevant documents by virtue of them containing the most commonly associated (written) words. Sounds useful? I think so.
Step 2 – Better Scoring and Indexing the Web Site Means Free Traffic for Your Business.
We (as a search engine) now have a key search term / query (a name in this case), a web page and now we need a relevance “score” to help your business site get more traffic. The score itself is hugely variable in how it might be calculated, but you can see that if Latent Semantics is involved in establishing relevancy within a document collection AND the web page contains a high number of high scoring (by document frequency) semantically related words (or an even spread – again this depends on the implementation) – then the page will be given a high score and thus rank higher for the term it is indexed as being relevant for.
Other considerations when assigning the score prior to indexing which are currently being investigated / under research, is the use of LSI within weighted parts of the document. These might include the page title or h1 tags for example, where we see a “sprinkling” of words possibly attracting a higher rating than that applied at the time of prevalence counting. Every bit helps to enhance term meanings.
For example, we may see the word atom within a title tag. If it appeared in a p tag section – it might have a score of x. Whereas in a title tag – it might have a score of x+title value adjustment – these could in theory, increase the overall relevance of a document to its associated term.
The lengths to which a smart computer algorithm can go to in order to adjust document to term relevancy are endless – this is a very simplified look at what may be in place but the available resources would, I think, be sufficient.
Once the finding and indexing process is complete, the page becomes available for access by the search routines. When the name is entered into the search box – the returned results list can be sorted by those with the best relevancy – amongst a multitude of other factors, which may or may not outweigh the technical relevancy calculations.
Step 3 – When Writing Semantically for SEO, our LSI Strategy Tools can Help with Indexing.
With the above in mind, to be relevant for a given topic, it is worthwhile to make efforts to research and identify semantically related words within the context of your document and target these keywords as a strategy and use them in order to provide a detailed article and improve your content and its meaning. If you write with this in mind, your page will likely see a potential increase in visitors via SEO, answer questions for readers. Our LSI tools can help with this process.
As an example, it is generally accepted that no longer (and has not been for a long time now) is it just a case of simply identifying a group of words e.g. “pot, potting, pots, plantpot, potter” and hoping to rank for pots. It is always worth (although not proven) bringing in specific words actually related to your context and building a richer document may help with getting a higher ranking. Remember that LSI is about semantics – not just plurals, similar topics and other same word deviations. Writing about pots, you might need words like “seeds”, “compost”, “roots” etc. depending on the type of information you are looking towards getting indexed for on queries which match, along with a boost from Latent Semantic Indexing concepts.
If you’re still not convinced, think about this – please keep in mind this is an experiment in language, words and text which shows the extreme. A document you have an idea for is under development. The document is about the concept of knowledge. Imagine where you might rank if the only word you used was knowledge, and that was the only word on the page, how would a search engine easily determine any particular relevancy with something more. Now add the word “gain” – a user might see that and not know what it was about still. A search engine would have no idea either. By adding another word, “school”, you can see that your simple three word page now looks more credible. Think on – to be successful you need more words, so add in “books”. The page is building on what it might mean. Now add in the word “tips” and carry on adding a thousand more related words and you have the big picture.
Understand LSI In Summary and Increase Your Search Engine Rank.
This is a summary, make no mistake – there is a lot of science behind LSI and mathematics as well – the detail of latent semantic analysis and what it is covers areas such as term value decomposition, inverse document frequency and it also has it’s known downsides. But at the end of the day, also remember that we don’t even know for certain how much of a role actually using LSI plays in the day to day indexation and ranking of web pages by any search engine. But using the above guide gives a good idea of what might be happening – and if nothing else – you can certainly use the overall concept and our agency services, if needed, to find LSI keywords and enhance your existing documents search engine optimization. It may even help you create new ones with thicker content to help with a rank increase overall.
If you would like a free Latent Semantic Indexing concept trial on your domain – complete the form below to get in touch and we will provide the information you need to address your strategy using our custom LSI keyword tool – free of charge!