How fast is AI Improving? Pattern Recognition Accuracy and Computational Power
Price Waterhouse Coopers, Accenture, and McKinsey predicted a few years ago AI will create economic gains of about $15 trillion by 2030. As part of this huge economic benefit, they also predicted AI will revolutionize health care, energy, education, and logistics, among other industries. For instance, McKinsey’s predictions of a 10% improvement in energy efficiency in the UK and elsewhere were based on the purported success of DeepMind and Nest Labs, both subsidiaries of Alphabet since 2014, both of whose losses far exceed their revenues in 2019. Not surprisingly, no independent confirmation can be found for these purported successes and in reference to DeepMind’s claims, a 2019 Economist article claims that “some insiders say such boasts are overblown.
Other startups are also be unprofitable nor having a big impact on economic productivity. Of six AI startups that have achieved Unicorn status, only CrowdStrike has done an IPO, and its 2019 losses were equal to 30% of its revenues. One other Unicorn, UI Path, laid off 300 employees or more than 10% of its workforce in November 2019 and four other large AI startups (Data Robot, Zoox, Zest AI, and Automation Anywhere) have reported layoffs this Spring, layoffs that suggest they are losing money. China’s four largest AI startups are purportedly unprofitable with Megvii’s losses 2.5 times larger than its revenues. Furthermore, in my analysis of the 41 largest American AI startups in terms of VC funding received, less than 1/4 offered products and services that directly impacted on productivity. Instead, most offered basic hardware and software, important, but not the type of products and services that suggest AI will soon have a impact on productivity.
Hype about AI for health care and logistics applications were also over blown with IBM’s Watson making little headway in hospitals. In fact, the term IBM Watson has almost completely disappeared from media reports, and searches for it have declined significantly according to Google Trends. Although there are some promising reports about AI in medical imaging, these success stories are still a long way from wide-spread usage, and even if they become widely used, they will not come close to the level of success promised by proponents of AI a few years ago.
This is the eighth article in my series on startups and new technologies. It addresses the chances of success for image recognition, which is applicable to wide range of applications including health care, driverless vehicles, biometrics, and augmented, virtual, and mixed reality. If we consider its more general category, pattern recognition, a broader set of applications exist including the interpretation of legal, accounting, and scientific documents and speech recognition and synthesis by personal digital assistants.
Stanford’s AI Index, an assessment of AI’s progress shines some light on progress in image recognition. The graph below comes from this report and it shows accuracy scores for image classification on the ImageNet dataset over time, which can be seen as a proxy for broader progress in supervised learning for image recognition. As shown in the figure, accuracy has increased from 62% in 2013 to 86% in 2019, a seemingly large improvement in just six years. Should we be impressed? It depends on what type of accuracy that is needed? Medical imaging and driverless vehicles require much higher accuracies than 86% if they are to work without intervention by humans, perhaps well above 99%. Using 99% as a guideline, image recognition accuracy is not so high, and improvements since 2016 total only 6%, from 80% to 86%.
Seen from this angle, the increases are probably not rapid enough to achieve the greater than 99% accuracy that we desire. I would be much more optimistic about AI if the above figure showed a log plot of inaccuracies versus time, showing them falling below 10%, 1% 0.1% and so on over a 10-year period. This is what occurred with cost per transistors for microprocessors, per cells for memory chips, and for other rapidly improving electronic technologies. The performance of these technologies is always plotted on a logarithmic plot because the rates of improvement are too rapid to show on a linear plot. Thus, the very fact that AI researchers are using linear plots is more evidence that the required high accuracies are many years if not decades in the future.
A second problem with the above figure is that much of the recent improvements have seemingly been achieved through increases in computational power, and not through better algorithms. Computational power (petaflops x days) was increased by eight orders of magnitude or 100 million times over a few years, a tremendous increase over such a short time. Yet the accuracy of image recognition only rose by a few percent during the same years.
Seen in this light, it is not surprising that the head of Facebook AI (Jerome Presenti) says “The rate of progress is not sustainable.” “If you look at top experiments, each year the cost is going up 10-fold. Right now, an experiment might be in seven figures, but it’s not going to go to nine or 10 figures, it’s not possible, nobody can afford that.”
The slowing of Moore’s Law likely also impacts on this issue. Not only have reductions in transistor size and increases in the number of transistors per chip slowed, cost reductions have slowed. As shown in the below figure, the cost per transistors has not fallen since 2012 nor has the cost per logic gate for smaller transistors since 28 nm transistors were introduced many years ago.
To be fair, we do not know whether the few percentage points of improvement in image accuracy came from better algorithms or more computational power. We also know that training time was reduced from 30 minutes to 2 minutes between mid-2018 and mid-2019, an impressive reduction (shown below). But reductions in training time are not surprising given the huge increases in computational power. More importantly, the fact that huge increases in computational power were implemented even as improvements in accuracy were limited to a few percent suggests that achieving the greater than 99% that we need is probably extremely difficult unless there is a breakthrough in AI. Thus, the chances that AI-based medical imaging, driverless vehicles or even speech recognition to achieve the levels of performance we desire in the next few years or even decade are somewhat small.
AI performs even worse at identifying human activities in videos according to Stanford’s AI Index. There are some activities for which recognition is achieved with high levels of accuracy (>75%). These include cheerleading, table soccer, dancing, and baton twirling. But there are many others for which the accuracies are lower than 15%, including shot put, high jump, gargling mouthwash, washing face, drinking coffee, or smoking cigarettes. And the improvements in accuracy for many of the latter activities over the last four years is less than 5%. Given this poor performance, we are a long way from AI understanding video images.
Interestingly, Stanford’s AI Data Index did not provide data on speech recognition accuracy, despite the similarities between image and speech recognition (both are pattern recognition) and the rising level of interest about Siri, Echo, Google Assistant, and Alexa, personal digital assistants from Apple, Amazon, and Google (and Open AI’s recent release of GPT-3).
It did, however, provide data on the non-dimensional performance of speech recognition, and its assessment of this performance, does shed light on the performance of personal digital assistants. According to one summary: We know now how to solve an overwhelming majority of the sentence- or pararaph-level text classification benchmark datasets that we’ve been able to come up with to date.”. He continues: “We’re solving hard, AI-oriented challenge tasks just about as fast as we can dream them up.” “I want to emphasize, though, that we haven’t solved language understanding yet in any satisfying way.” This about summarizes my assessment of Google assistant, which I have tried to use many times
What can we expect from these assistants in the future? Open AI’s release of GPT-3 has generated a lot of interest with most of the focus on how far research has come, which is astounding. But the bigger question is how much further GPT-3 must be improved before it becomes truly useful? Critics have pointed out that it cannot answer simple questions such as: who was the American president in 1700, what number comes before 10,000, and what is left in a box after a shoe and pencil are added, followed by a shoe being removed? A UCLA computer science professor compares it to a very tall building: “I think the best analogy is with some oil-rich country being able to build a very tall skyscraper.” “Sure, a lot of money and engineering effort goes into building these things. And you do get the ‘state of the art’ in building tall buildings. But … there is no scientific advancement per se.”
How much better can GPT get? Can its accuracy reach 99%, 99.9% accurate, and so on? Just as image recognition only achieved a few percentage points in accuracy over the last few years, it is highly unlikely that GPT will achieve much more than this over the next few years and thus achieve the greater than 99% that is needed. Even the Wall Street Journal, normally an optimist about new technologies, scientific advances, and capitalism, is pessimistic and argues that symbolic learning is needed, which is a completely different approach from deep learning. While deep learning focuses on feeding machines enormous data sets so they can learn to recognize or re-create images or text passages, symbolic learning involves encoding knowledge and rules in a computing machine. As one industry expert argues: “what’s missing with today’s AI is we have to get beyond the level of the statistical correlations that deep learning models tend to learn.”
In summary, the slow rate of improvement in image recognition accuracy suggests that useful AI is much further in the future than many claimed five or ten years ago. This does not mean, however, that researchers have not made tremendous advances, quite the contrary. It is impressive that machines can now mimic large amounts of human capability. However, in terms of commercial potential, which is of primary interest to many policy makers and entrepreneurs, the capabilities seem insufficient to radically change health care, driverless vehicles, or personal digital assistants in the next five to ten years.
 AI and Economic Productivity: Expect Evolution, Not Revolution. IEEE Spectrum, March 2020
 What Drives Exponential Improvements, California Management Review Spring, 2013 https://journals.sagepub.com/doi/abs/10.1525/cmr.2013.55.3.134
 Elon Musk-backed Open AI has created excitement with its new tool GPT-3, but the tool can’t answer simple questions such as: who was the American president in 1700, or what number comes before 10,000? GPT-3’s predecessor GP-2 made headlines for being deemed “too dangerous to release” because of its ability to create text that is seemingly indistinguishable from those written by humans. GPT-3 is supposedly even better with 175 billion parameters or more than 100 times the number in GPT-2. But a UCLA computer science professor compares it to a very tall building: “I think the best analogy is with some oil-rich country being able to build a very tall skyscraper.” “Sure, a lot of money and engineering effort goes into building these things. And you do get the ‘state of the art’ in building tall buildings. But … there is no scientific advancement per se.” For me, the questions is: can #AI and #machinelearning improve productivity by doing something that currently requires lots of manual work? Being able to generate text is nice but don’t we already have too much text sloshing around on social media? The same holds for any #technology or #innovation. #technologies https://mindmatters.ai/2020/07/gpt-3-is-mindblowing-if-you-dont-question-it-too-closely/