Artificial Intelligence (AI) seems to be everywhere these days – from driving your autonomous car to detecting diseases in clinical conditions. Artificial Intelligence has permeated almost all fields and has evolved into a basic component of business growth. The term AI encompasses a wide spectrum of converging technologies – Deep Learning, Machine Learning, neural networks and Natural Language Processing (NLP).
Definition of Machine Learning:
Jason Brownlee sums up like this: ‘The valuable part of machine learning is predictive modeling where we use historical data to train a model and use the model to make predictions’
Tom Mitchell in his classic book on machine learning says:
‘The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience’
Bereft of confusing buzzwords, AI is basically a technique to look for a pattern from training on a set of data. The key here is the data. Unless you start with the right data and pre process the data and massage that for learning, you will end up with poor results.
Moravec’s paradox: Hans Moravac discovered that high level reasoning requires very little computation, but low-level sensorimotor skills require enormous computational resources. In another instance some researchers found it odd that they were successful at writing programs that used logic, solved algebra and geometry problems and played games like checkers and chess. Logic and algebra are difficult for people and are considered a sign of intelligence. They assumed that, having solved the “hard” problems, the “easy” problems of vision and commonsense reasoning would soon fall into place. They were wrong and one reason is that these problems are not easy at all, but incredibly difficult. The fact that they had solved problems like logic and algebra was irrelevant, because these problems are extremely easy for machines to solve.
To quote from a personal experience, I was having difficulty finding the right corpus to train our AI for our Athena -Simplified Health Information project. I was cribbing about this Artificial intelligence problem when my partner blurted out – ” that is because it is artificial; it is not a natural intelligence”.
I usually throw a piece of meat taken from the refrigerator to move my 1 year old German Shepherd dog to the backyard every evening. On most days, he would sit there watch me fetch the piece. I decided to fool him once and touched the piece of meat but instead fetched a sliver of cabbage. He couldn’t have noticed the switch as I did it in a slight of hand and threw the piece out. He didn’t move from his position. He must have anticipated my switch somehow. I was clueless – about how a dog could predict his owner’s behavior without any extra sensory input.
May be there is an inherent limit to our creative prowess – something like St.Augustine’s conundrum. There was a time St.Augustine sat near the seashore contemplating the mystery of the Holy Trinity when he saw a little child running back and forth from the sea to a spot on the seashore. The boy was using a shell to carry water from the sea and fill a small pit that he had made in the sand or it so appeared to St.Augustine.
Augustine came up to him and asked him what he was doing.
‘I’m going to pour the entire sea into this hole’ the boy replied.
‘What’? said Augustine. That is impossible, my dear child, the sea is so vast and the shell and the hole are so little.
The boy replied It would be no more impossible than what you have been contemplating. It would be easier and quicker to draw all the water out of the sea and fit it into this hole than for you to fit the mystery of the Trinity and His Divinity into your little intellect; for the Mystery of the Trinity is greater and larger in comparison with your intelligence than is this vast sea in comparison with this little hole.
We, at TargetWoman have been working at the Natural Language Processing from the year 2004 onwards – around the time we started. Simply put Natural Language Processing is the process of using specialized computer algorithms to identify key elements in everyday language and extract usable meaning from unstructured input.
Microsoft says this about languages: “Understanding’ language means, amongst other things, knowing what concepts a word or phrase stands for and knowing how to link those concepts together in a meaningful way. ‘
For example, the word “cell” can mean different things to different people. It can mean a prison cell, compartment in a honeycomb, smallest organizational unit of a movement, a component of a battery, mobile phone or the smallest structural unit of an organism to name a few of the definitions. The surrounding words help to exact the true meaning of the word in relation to the context. In other words, you will have to decide the meaning of the phrase only after analyzing the entire sentence.
We started working on the concept of Natural Language Progression way back in the year 2004 – long before the term Artificial Intelligence became a buzz word. We worked on the premise that to convey an idea or a thought – words are only a transport layer. It is not the word per se that conveys the idea/thought but the collection of words, their proximity to each other and the context. With that said, we turn our attention to using computers to understand the logical progression of our language which some call as Natural Language Processing. We have created a working model which can understand the subtlety of the language we speak – at least limited in context to health and medical topics.
Sample this: Our Natural Language Processing Engine when triggered by the ubiquitous word ‘sex’ comes up with the following words – pregnant with meaning (Natural Language Association):
Natural Language Navigation:
In TargetWoman, we have thousands of pages dedicated to different topics of interest to women. It is always not so easy when it comes to navigation inside such large repository of content. A conventional way of navigation is to split the content into various clearly defined topics like: health, home improvement, careers and travel to name a few. But every topic may have dozens of pages or some even in hundreds of pages. Most large websites addressed this issue with a site wide search facility. We implemented this search engine at the outset itself. But it was not enough as it turned out from the server log. A better method needs to be found which would take into consideration how people navigated inside a site. This is how we turned our attention to Natural Language Navigation. We fed the key topics of every single page to our Logical Progression Engine – another term for Natural Language Navigation. This in turn sorted the content based on ‘keywords’ and their frequency of occurrence. The latter part provides a strong correlation to the keyword with the nature of content. This is how a typical search engine evaluates/indexes content. The data was updated everytime a new page was published.
A real world example here:
If you click on the ‘Browse by topic’ for an article: Example:
“Browse by Topics: Aerobic+Water+Exercise” would return the following content:
1. Home Exercise Equipment
2. Swiss Exercise Ball
3. Morning Exercise and Metabolism
4. Physical Fitness Exercise
5. Abdominal Exercise
6. Circuit Training
7. Xiser Workout
From the above abbreviated list, you can see that the NLN system picked out only content related to the physical exercise alone and did not pick content from Parenting with the keyword exercise (home exercise for students).
We would discuss how we collected the vast data and how we created the algo for this project in subsequent blogs.
Until that time, try our NLP here:
You can discuss in the comments section how our Natural Language Navigation helped you to find information on health topics.