Latent Semantic Analysis

Recently people in the know of things about search engines use a buzz word – LSI  – Latent Semantic Indexing. For example when you query Google for the phrase – women portal like this : ~women portal

You can see the tilde(~) before the search phrase to indicate that you want the latent semantic analysis to be turned on, Google will try to look for the logical extension of the supplied phrase – to put it in a simple manner. Actually, it is a lot more complicated than this.

One thing I like about the Internet is the way it offers level playing field for anyone whether you are a 900 lb Gorilla or a timid 6 lb Chihuahua. We have been tinkering with LSA for sometime now – about a couple of years. Our agenda is much simpler in nature – to deliver the right page from our thousands of pages of content for a given search phrase. You would have noticed in our main page and elsewhere a search box with some mumbo jumbo about Natural language navigation.

To tell you the truth with out much technicalities and hype, LSA ( Latent Semantic Analysis) is a simple behind-the-scene process by a computer program to figure out the concept behind the word phrase and identify the matching content. Most writers use different word or phrase to describe the same idea or concept. Even the most painstaking editing effort before the publication stage will not weed out the individual bias of the language to mean different things for different people. Editors can offer consistent style and language across the entire website – but can do little to bring homogeneity to the choice of words.

For those who are technically inclined this is what is called as synonymy – where many words exist to describe a singular idea. In contrast, Polyseme describes a word or phrase with multiple meanings – again a problem in our search engine approach. Some of the words people use to search might end up getting the wrong page.

Human languages are probably one of the most complicated issues to be handled by computers. Subtle nuances of the language are not so easy to quantify in objective terms. People often intuitively “arrive” at the intended meaning of the written word by the position of the words in relation to each other. Cues like modulation of the voice, emphasis placed on syllables etc, which exist in spoken words don’t exist in a written page.

The only cue left to analyze is the relative position of the words to each other and the frequency of occurrence in each article page. Most of our pages contain thousands of occurrences of common words, which receive less weight than the unique primary keyword phrases. Evidently, these ‘weighted’ phrases are factored in to our search engine along with their synonyms for classification.

To cut a long story short, we decided to use an extensive dictionary of English words to help our version of LSA. Sometimes it is really thrilling to see that our internal search engine delivers the most appropriate article for the search phrase with relative ease. On the other hand, equally it is stumped by a contrived phrase though the frequency of this occurring is relatively small.

The technology to negotiate the vast realms of human languages is still nascent and our LSA is still at beta.

To go back to the first example of the LSA concept where we used Google to  look for the semantically related words matching the phrase – women portal, you should see many occurrences of lady, woman, female and so on from the search results page. It is also in beta …


3 Steps to Effective Blog Writing

Internet in a way is akin to instant coffee and fast food. In the conventional print media, it takes time to create and sustain a following. However, the effect is rather long lasting compared with the Internet. Internet is more effervescent.


Writing for such medium calls for a different set of skills. I intend to write about the techniques for the most effective blog . It is not going to be short or brief.


  1. Passion – Unless you have a passion for the subject, people are not going to move on – click away. Unless you have a passion, you will not be thorough in your knowledge. Unless you have detailed information about the subject you are passionate about why would anyone be interested in reading it ?


If you are passionate about your subject, dig everything possible about it. Read, research and collate the material. You will be surprised you started with the seriousness it deserved at the very beginning. Later on, it is difficult to sustain the tempo.

2.     Patience – Patience is a Virtue for some. But for us, it is vital. As I mentioned earlier, when you deal with an effervescent media like Internet, you need to be aware of its flip side too. Internet offers you level playing field with the high and mighty. Equally, it can shred the impact of your online presence. You need enormous patience from the concept of the topic to eventual clothing of the skeletal work. Remember Time is not a consistent unit even when it comes to the Net.


There is a parable from an Ancient Hindu text about the time concept:

Brahman (God) and Naran (man) were walking along the beachfront in the distant past where it was probably normal for them to interact. Brahman developed a thirst and so asked his companion Naran to get him a glass of water. Naran, dutifully ran across to the nearby village.

It so happened that the first house he knocked at was opened by a beautiful woman and Naran forgot what he came for. Eventually Naran married the woman he saw and settled down in the village. Many years passed before calamity struck in the form of an overflowing river, which flooded the village. Naran’s family was swept away and Naran hung to his life from a tree. The surrounding receding water finally reminded him of his quest for a glass of water for his Brahman.


He rushed to the beachfront with a glass of water and begs Brahman for his long delay. To his complete surprise, Brahman says – “Ah, You are back so soon” implying that their time is not the same.




You don’t become a celebrity overnight when you write your first blog. You need to write about a subject most should be interested but no one has written about – almost impossible.  It takes patience in good measures to see your work bear fruit.




3.     Style – Develop your own unique style. Did you notice something in the internet – that every other page looks like the other one most of the time? Style is what shows that you are unique – it is characteristic of your personality. It exudes your stamp of authority. If you plan on writing for the Net on a long-term basis, it is imperative that you develop your brand of style. Use humor, subtle wit, pungency or sheer arrogance – it is part of what you want your readers to feel about.


Look at all those blogs which perform well – they share one thing in common – the authors are identifiable by their stamp of style.  


How do you know that what I have said above really works?

Simple.  We have conceived, written, presided over and edited over 3000 pages of  content in the last 3 years for the Internet.





Managing a mid size portal – Internal Navigation

It is an irony that as you grow in size in the Internet, you become more dependant on free form of advertising. Search engines contribute a major chunk of traffic for all small and mid size portals, until you establish your brand by which time you become one of the major players in your niche segment.


There is a paradox in this as Yossi Vardi – one of the founders of Mirabilis, the maker of ICQ has mentioned – “The value of any Web site is in inverse relation to what it costs to attract new users.”  One of the major costs of running an internet venture is the cost of Ads and the bottom line is to find the elusive goal of finding the maximum return for your money invested in the ads.


Search engine results are, at the best of times fortuitous. Consider this typical example for the World’s leading software giant – Microsoft.  (The domain has been registered in 1991)


A search for the keyword phrase ‘software’ in Google does not show Microsoft in the first 50 search results.

In Yahoo Microsoft comes at 6th position for the same search. Interestingly in Microsoft’s own MSN search engine it is not there in the first 5 pages.  Your mileage might vary as the search engines keep changing the order.


Even searching within your website pages, they don’t display pages in their original order of importance. This is more in evidence for a portal like Targetwoman.  In our diet section, we cover almost all the major mainstream diets invented since the time of sliced bread. Yet a typical searcher rarely reaches the right page through the search engines. As we can glean from our server log files, they end up in the wrong diet section and invariably navigate to the appropriate article.


To help them traverse our thousands of articles pages, we had to build a search engine from the scratch ourselves. After much head scratching and drinking gallons of coffee, we are almost there – (not quite). ‘Browse by topics’ is one such tool intended to lessen the time taken to navigate inside the portal instead of reading the content.

Our Parent site TargetWoman - the leading women portal presents painstakingly researched extensive information in the form of thousands of condensed pages. It offers the widest and the most detailed information on subjects women care.