Toyota is in hot water. The company recently announced recalls on seven U.S. and European models, due to faulty gas pedals and braking systems. Not only is this a tremendous blow for the world’s largest automaker’s image, but it could also have a serious impact on it’s sales.
A drop in sales has already been registered: in the US, sales have declined by 8.7%, due in part to the removal of some defective models from the market, in Jan 2010. For the first time in 10 years, Toyota’s sales sank below 100,000 units; and in the first few days of February, the stock was down 5.69% on the Tokyo Exchange.
Toyota is now focusing on winning back it’s customers and reassuring them of the quality and the safety of it’s vehicles. And what about the Voice of the Customers? Could Toyota have foreseen the imminent crisis just by analyzing the opinions expressed online? Could they have had a notion of which aspects were being discussed and criticized by users?
We used Cogito Monitor (our semantic software which automatically processes content in detail) to find out what users have said about Toyota. We examined the comments posted on the most popular U.S. automobile blogs and forums such as: caranddriver.com, carforum.com, autoblog.com etc.
In 1969 Arthur C. Clarke introduced us to his computer named HAL. He had us believing all we needed to do was talk to HAL. HAL would listen, understand and do what we wanted. Until HAL, that is, developed an evil soul and did nasty things to humans. The evil soul is pure fiction but HAL is not.
2010 is the year we get to meet the real HAL. He may still be a child but he is growing up fast thanks to four trends in computing that have coalesced and are now ready to explode. These trends are The Cloud, The Pipe, The UI and The API. A depiction is below.
The Cloud is elastic computing power. It is more than renting a server from a service provider. It means automatic, on-demand scalability onto as many servers as are needed to accomplish a task or take care of a sudden flood of customer needs. The Cloud gives any size organization the appearance and performance of Google-sized computing.
The Pipe is everywhere, all the time, high speed internet connections. Typical wired internet speeds today are over 6MB per second and wireless connections are quickly catching up with that – 3G and soon 4G deployments are common. The biggest trend in mobile devices is smart phones. These are devices that do more than route phone calls, but also manage email, calendars, music, applications and the entire internet. But of course the processing power to do these things is not all on the device. Instead it’s up in the cloud.
The UI or User Interface is smart. Speech to text and semantic technologies combine to allow for the appearance of intelligence. Computers or mobile phones spoken to in natural language understand and then locate, calculate, connect, tally, and display the answer to queries rather than simply list resources for you. Try Nuance, Vlingo or Google Mobile for speech to text accuracy. Try us at Expert System for semantic processing accuracy.
The API or Application Programming Interface means really useful applications. API’s package the first three trends so that creative types can make applications for specific tasks, domains or verticals quickly and make lots of them. Look how many IPhone / ITouch applications have been built in the last 2 years alone. Many have been built by individuals and not large corporations.
These four trends create a virtuous cycle. They combine to bring a sudden higher platform of computing. One that engages the imagination, has enormous productivity, improves processes and creates new value out of existing information resources.
No you can’t really see or touch HAL. But be assured he is there, working in the background, growing, learning and getting smarter every day. He is ready to serve you. Just ask.
A few days ago, Microsoft announced it’s intent to abandon the development of a Linux/Unix version of FAST, the corporate search engine it purchased a couple years ago. The decision didn’t really take anyone by surprise, being that Linux is Windows’ only real rival in the world of servers. So, obviously, Microsft would have no interest in developing solutions for the competition’s operating system; not to mention that FAST is increasingly integrated with SharePoint, which just further goes to prove my point.
From a strategic point of view, the choice is quite understandable. But from a sales standpoint, it seems to be an enormous sacrifice and a huge opportunity for the competition. From what I have read, at least half of FAST users have Linux/Unix (some actually say it’s close to 80%). This means that these users will have use another company’s search engine should they decide to change theirs. With this aspect in mind, I think Microsoft would have been better off if they continued development on systems which differ from Windows. However, if think about the fact that our search engine is compatible with Linux, thus giving us more sales opportunities, then I think they made the right decision
I recently read quite a few interesting articles about Twitter. The most intriguing (and exciting) was about the first tweet from outer space. At the moment, the concept of an intergalactic World Wide Web resides in the minds of few earthlings ;-), but there are however, already hypothetical plans for web servers to be hosted on Mars and on the Moon! Last week, Twitter’s effects on crowdsourcing was addressed by Alec Ross and Jared Cohen in a chat moderated by Google’s Eric Schmidt, where social networks in general were discussed. But, what really caught my eye was an article which reported the live coverage of an accident which was averted in flight. Apparently, a man attempted to open the plane’s exit door, but was promptly stopped by other passengers. Among those on board was the General Services Administration’s CIO, who sent out three tweets as the action took place, and in less than 300 characters, created a sensational news story.
E-mails, text messages, and social networks are some of the most innovative communication instruments today. The advantage of the text message is that it is simple and accessible to everyone (in fact, tens of millions of messages are sent everyday). These messages could certainly become functional and immediate channels for public involvement in safety issues. Citizens could use these systems, on a 24 hour basis, to give notice about events and situations as they happen, so that the public could be better served and numerous criminal acts could possibly be avoided.
The potential risk, however, is that these messages will go unacknowledged, or even worse, that they will be taken into consideration when it’s too late. For this reason, once citizens are offered the opportunity to participate directly, it is essential that law enforcement be ready and prepared to listen to them. The complication is that enormous quantities of information need to managed efficiently. Semantic technology can be used resolve this problem; it is able support the activities of data collection and analysis and can quickly sort through messages, thanks to its ability to “understand” text. In this situation, it could easily be applied to a system which allows citizens to use social networks, e-mails or mobile services to report crimes or alert officials of neighborhood situations, such as: broken streetlights, potholes, vandalism, etc.
Instant blogging has forever changed the life of new generations, but can it also revolutionize public safety? I believe it will, and I believe that the real enabler behind this revolution will be semantic technology.
In science we have tackled great problems. It was only a short number of years ago that we had mapped the human genome. Imagine unlocking the code of what makes us human. More recently, scientists are studying how proteins operate. Or more precisely how they fold. It is in the folding that we learn what a protein is intended for and what job it is supposed to do. Once we unlock this we will know how diseases form, replicate and, most importantly, how to beat them… all of them.
So what does the information science of semantics have to do with proteins? Semantics fold too. That’s what.
Scientists studying proteins that fold are discovering it’s most important and elemental attributes.
The same is true with semantics. Boil a sentence down to its most elemental parts and you get what is called a triple – that is a subject, a predicate and an object. So consider the sentence below;
“John works in the White House”.
Subject: Who or what does the sentence describe? Obviously, that would be” John”.
Predicate: What is the property that describes or connects the subject to the rest of the sentence? That would be the verb “works”.
Object: What is the value of the property? That would be “White House”.
So that example is pretty easy. What about a longer sentence. Something like this;
“John, a favorite of the President Obama from his days in Chicago,
now works as public liaisonin the White House”.
Now the job is tougher. It is clear John is still the subject of the sentence. It might be tempting to assign “favorite” as the predicate since it connects John to President Obama. But the commas indicate to us that this is really a clausal description of John and not the central action of the sentence. So we are left with “works” as the predicate. But what does “works” connect to? Is it “public liaison” or “White House”? The stronger connection is “public liaison” since this describes the kind of work John does. The White House is just the location of that work so it is nothing more than a qualifier.
When we learned to read as a child we were taught to reason through these example sentences pretty much like I just described. Of course you don’t think about it very deeply – the understanding of the sentence, the essence of it comes naturally: John – works – public liaison. The rest just colors these most important facts.
Semantics is the information science of establishing meaning over text without human intervention – and this includes establishing the triple of any sentence. This is also what is called the Semantic Web or Web 3.0. From a diagram perspective this basic notion is sometimes represented notionally like this;
You will note this diagram looks much like cells or proteins linked together. There is a reason for that. Like the proteins that fold and match up along the edges that are common in order to do their work so do semantic triples. Switching to a protein example now let’s consider these two sentences;
1. Protein X adds two molecules of zinc to the cell for each molecule of oxygen.
2. Protein Y adds one molecule of copper to the cell for each molecule of iron.
Our diagram now looks like the following;
So what happened? Each sentence has its own triple. But they have a common predicate of “adds”. So we can diagram two subjects and two objects but with a common predicate.
Just like proteins that fold and combine to make something new we have done the same here in the science of semantics. Because we boiled the sentences down to triples, stored them in a place that can be queried we can ask for all predicates that match to “add(s)”.
Why is this important? It gives scientists, researchers, business professionals, citizens a chance to tap into and glean true meaning from their documents, email or the web. This is far different from a Google like keyword match. The word “add(s)” certainly matched but it was the words role that also matched.
But what if the author of sentence (1) did not use the word “adds” but instead used the word “increased”. A keyword match would fail here. But semantics can also understand that “add” and “increase” are related and so the query would result in the same scientific discovery of Proteins that add/increase molecules.
Now let’s change sentence (2) from Protein Y to Protein X. A more restrictive query on a store of triples where you would ask for both subject and predicate matches would result in a diagram like below.
Again why is this important? Because now a scientist can rely on the smarts built into such a search index to deliver all the Protein X’s that add/increase [some kind of] molecule to a cell. The interesting thing for the scientist will be to group and sort the kind of molecules that will be added to the cell.
This is real discovery in science. It is semantics that get language out of the way. It is semantics that build in smarts to a system so the scientist can find, analyze and create new cures for diseases that have yet to be worked on effectively. So… semantics and folding proteins do have a lot in common – more than you thought.
As I have written many times before, semantic technology is unique in that it is able to go beyond the limits of other types of technology and approach the automatic understanding of a text. It is not perfect, however, and it certainly has yet to reach its maximum potential.
I realize that it’s not that easy for those who don’t work in the sector to understand (especially due to the fact that there are so many false promises out there, which tend to create unreasonable expectations, muddled ideas and market chaos). Therefore, it might be useful to use a common experience as an example, such as: our learning process.
Let’s start from the beginning: from the moment we (human beings) begin to talk, understand, learn, go to school, etc… We require at least 12-15 years to be able to read a newspaper and understand the most general articles and this is thanks to the experience we developed while learning the meanings of words and experimenting with a great deal of different phrase constructions. Consequently, the learning process is lengthier when we decide to tackle more technical terms or specific topics.
Learning takes time, and the same goes for a computer. It’s true that a computer can process in nanoseconds while we think in milliseconds, but it is also true that our method of learning uses a device (the brain) that no one has been able to fully understand and that is able to do things that not even the most powerful computer can imitate.
In summary, it doesn’t make sense to expect that a computer be able to perfectly analyze and understand a biology text, for example, without first having learned all it can about that subject. There are no shortcuts nor magic formulas: learning a language is difficult and even automatic processes require time and labor.
This week we announced the appointment of Julie Hartigan, Ph.D. as CTO of Federal Programs, and Rita Joseph as Vice President of Federal Programs. The expansion of our executive team here in North American is directly in line with our overall goals and vision for growth in the U.S.
Julie and Rita have the extensive experience to help us drive our federal program initiatives. And we’re all satisfied that in an era where government seeks to “connect the dots,” both of these seasoned veterans will bring expertise, guidance and our advanced, high speed, multilingual semantic processing to federal government agencies.
Wet morning in Santa Clara. People seem to be looking at the sky as if it was falling. We are not used to so much rain here.
There are not many people at the conference. The audience is an interesting mix of semantic geeks, marketing and product managers, business people. Definitely a very heterogeneous crowd.
The most interesting presentations are by Scott Prevost of Microsoft Bing and Mark Greaves from Vulcan.
Scott Prevost comes from the Powerset acquisition by Microsoft and is now part of the Bing project.
“The Semantic Web? It is already here” he says. What he really means is that in the Bing project they use quite extensively semantic technology like the ones we offer at Expert System. His opinion is that semantics, that is already applied under the hoods in all major search engines, is here to stay and will gradually evolve and make the user search experience better – most of the time without the final user even realizing that he is using semantic technologies!
Bing applies semantics in a lot of different ways:
They interpret semantically the requests of the user. Example: “who mocked Sarah Palin” returns not only results with “Sarah Palin” and “mocked”, but also “parodied”, “impersonated”, etc. We at Expert System provide a similar functionality for the Enterprise market with Cogito Answers.
They classify the search results so that they can be filtered and navigated in a better way by the user – similar to what we can do with the Cogito Categorizer.
They try to leverage RDF information added by publishers to their pages – similar to the rich snippets in Google. This information can be added to a search result to make it more interesting to the user and improve his search experience. A classical example is the search results for a restaurant returning the Yelp web page with the average score and the number of reviews. We can help publishers to produce automatically these snippets using our Cogito Discover technology.
They apply semantics to their advertising platform so that the advertisement campaigns can be based on concepts instead of keywords as they are today. We offer a similar solution with our Cogito Advertiser product.
Another interesting speaker is Mark Greaves from Vulcan Technologies. One of the most interesting points that he talks about is the fact that a lot of data that used to live in databases around the world is now moving into the “Semantic Web”. The advantages are huge:
Linking the data: Think about relational databases and on how you can link one piece of data from one database to another one (maybe belonging to a different organization). It may not be impossible, but it is at least very difficult. One basic advantage of the Semantic Web is that data can be linked in all sorts of ways. The OWL standard in particular provides the means to connect data in different “clouds” very easily.
“Organic growth” of the data: The Semantic Web also allows for “organic growth” of data. As opposed to relational databases where you need to define an outline before you even start entering any data, the Semantic Web is designed to provide the flexibility to add and modify data in different formats in different points in the web. With open data usually there is also a community that maintains it and makes sure it is accurate.
There are also some recurring themes at the conference that seem to be common in many of the talks:
- Mobile Internet: Internet on Mobile devices presents some specific challenges. The environment is different (e.g. no big keyword or mouse and much smaller browser). The market is huge, the opportunities also. Search Engines, social networks, content providers discuss how to use semantics to develop this new space.
- “Internet of Data”: the huge amount of Linked Open Data that is available for free today on the Internet represents a new and ever growing opportunity that can be leveraged by computer programs to help us humans in our daily tasks.
- Social Networking Interaction: this is a concept that seems to mean different things to different people. Some people talk about how social networks can be represented in a “semantic” way with RDF so that it can be used by semantic web applications. Other people talk about the way people in social networks contribute in publishing and maintaining data in the Linked Open Data Cloud in a similar way that the Wikipedia community has developed the huge Wikipedia knowledge base in the last few years.
Bottom line is that the Semantic Web is already here and the ideas discussed at Web 3.0 are mostly about opportunities on how to leverage in order to make our life better…
by Walter Pezzini, VP of Pre-Sales and Professional Services at Expert System
When I present a company with our software solutions (which are based on a semantic technology that uses a rich and vast semantic network), I find myself in front of an audience who clearly understands the advantages of this approach. Yet, the series of concerns and doubts they raise often clouds the decision-making process and causes an incorrect evaluation of the actual return on investment.
Whether they are raised by IT managers, KM workers or software developers, the concerns fall into two categories: the first, the costs related to the setup and maintenance of the semantic network and the second, the costs related to the infrastructure required to maintain a performance level able to satisfy operations.
There are many reasons behind these concerns, but two factors seem to stand out. On one hand, there are the excellent (and often incorrect) communication activities carried out by the makers of systems based on keyword technology. They have almost succeeded in convincing the market that a complex problem such as information management can be solved with automatic shortcuts and that any other alternative would be unaffordable. On the other hand, the majority of researchers in this sector are still skeptical about systems which are entirely semantic. This is mainly caused by their inability (at least up to now) to develop software which can combine the advantages of increased text comprehension with performance in order to meet the demands of the real world (thus further strengthening the position of the competition.)
In the past ten years, many successful projects have been developed using our semantic technology. Therefore, I think it would be useful to use real data from our everyday experiences to help clear up the misconceptions which often cause people to make irrational decisions.
Costs of development
To add a new language to Cogito, two man-years of software development and 8-10 man-years of linguistic development are needed in order to refine the semantic network. You can quickly estimate the cost of such resources (if you are in the Silicon Valley, divide your estimated total by 2!) and immediately understand that the initial investment is considerable, yet affordable considering the cost will be spread over all the implementations that will be done over time.
Cogito’s standard semantic network permits a horizontal management of content so that a significantly higher rate of precision e recall (compared to that obtained from a static system) is obtained with no need for further elaboration. For vertical implementations, start-up costs will be necessary so that a standard semantic network can be enriched with knowledge from a specific dominion (the number of added concepts usually does not exceed 5,000); usually 20-30 working days are needed for a linguist to complete this task.
For those who believe that “languages constantly change and adding new terms can be costly,” may I remind you that even the most dynamic languages, such as English, increase by no more than 100-200 new terms (of common use) and less than 1000 non-idiomatic expressions per year (in the worst case scenario, this could mean about 10 working days per year.)
Those who criticize the complexity of managing a semantic network often refer to the complexity of managing lists of entities such as: people, places, companies, organizations, etc. Traditional systems are able to recognize an entity only if it is present in a list; this aspect is often erroneously confused with semantic network management. A good semantic engine is able to recognize an entity based on the semantic role it plays within a text, therefore it does not require the creation nor the maintenance of lists. At the same time, it is also able to correctly recognize less frequent entities (which, for obvious reasons, have not been inserted in the list.)
Costs of infrastructure
Cogito can analyze more than 120KB of text (circa 40 pages of text) per second with a common single-processor server. This kind of speed, combined with its linear scalability and low cost, makes Cogito a practical solution even in situations in which large quantities (tens of millions) of documents must be analyzed.
The development and maintenance costs of a semantic network are considerably lower than what is commonly assumed; the improvements in terms of the ability to manage information (even when very complex) are obvious even to those who are not experts in this sector. I am convinced that when these aspects can be objectively analyzed (when myths and obsolete information are ignored), the number of companies which adopt real semantic solutions will increase.
I usually don’t talk much about the technical aspects of linguistics or semantics, but I would like to draw your attention to www.phrasedetectives.org . This website uses a game format to gather useful material for refining algorithms to resolve anaphoras and co-references.
Seeing as though this material could also be useful for us, and those who dedicate some time can also win prizes, I thought it would be nice to point out ![]()