In writing about the latest version of Googleâs voice search that is now available for iOS devices, I came across many references to the differences between how Appleâs Siri and Googleâs product handle voice recognition. It seems clear that these architectural decisions are in large part responsible for the speed differential between the two applications.
Simply put, Google Voice Search, which is a feature of the Google Search app, performs the voice recognition for each query on the client-side, while Appleâs Siri processes these requests on the server-side. This means that when you push the microphone icon on the Google app and start talking, the software process required to understand what you are saying is occurring on the device itself. And when you are performing the equivalent action on Siri, your device is passing that information to a remote server which is processing the request and then returning an answer, in pieces, back to your device. As you add words to your query, Siri adjusts the output until there is enough of a lull that it is convinced you are done. The advantage of Appleâs method is that it enables âserver-side learning,â so that the system gets smarter overall, the more it is used. The disadvantage is that, depending on how long the query and how clearly you speak, there can be a lot of back and forth (http requests) to get an answer. In actual use, this distinction causes a noticeable lag in Siriâs response time compared to the almost instantaneous recognition from Googleâs app.
When you get under the hood of what makes for great app experiences, this is the kind of stuff you come across. The design, the UI, is what you see of an app, but the underlying architecture and how it effects performance is what you feel. Before discussing this further, I want to make a short detour into the world of actual architecture.
A friend of mine studied with the great Spanish architect Rafael Moneo at Harvard. The Pritzker Prize-winner gave a lecture in the late 80s where he described, in his thick Spanish accent, one of his own buildings as being âabout sickness and depth.â His students scratched their heads and spent hours trying to puzzle out this enigmatic utterance until they realized that he had actually said âthickness and depth!â
When my friend told me this, over Taiwanese food in Boston, I had a bit of an Ah-Ha moment. I was on my way to a geeky Node.js âFramework Smackdownâ hosted at Brightcove. The discussions all hinged on what has become a more important distinction than that between HTML 5 web apps and native apps, the âthicknessâ of the client. See developer Eric Sinkâs recent discussion of this perennial debate, and also the concluding section of this chestnut from Facebookâs James Pearce, for more detail on this. Sink, and ultimately, Pearce, focus their arguments around the evolving nature of the client side of the equation, which enables developers to be more specific about architectural decisions than just saying, âit depends.â Most of the progress that has been made in web and app development in the past 5-7 years has been through âthickeningâ the client side of the equation and in making the data flows between client and server asynchronous. AJAX (which stands for Asynchronous JavaScript and XML) was the first and most famous expression of this and Node.js is perhaps the most well-known recent example of this trend.
But in response to Moneoâs quote above, it occurs to me that the two important parameters to look at in the design of an appâs architectureâ"the relationship between what happens on the server and what on the client and how to handle the back and forthâ"are optimal âthicknessâ of the client in relation to the âdepthâ of the underlying data. In the case of the present face off, both apps have a tremendous depth of data that they are accessing. Googleâs data is significantly deeper and wider, but in terms of design decisions, both are effectively bottomless. An obvious thought is that since search is on âoffboardâ activity anyway, why âonboardâ any part of the process?
The answer comes down to the programming paradigm of the separation of concerns. Making the processing of the voice recognition a function of the hardware of the device, makes that aspect of the application asynchronous with the actual database query. While Siri is going back and forth multiple times, Google is resolving the statement of the query on the client and then making a single request to the server. I will write about this more in a forthcoming post, but one of the things that characterizes mobile devices in contrast to desktop computers is the amount of interaction data (whether voice or touch) that flows between the user and the device. The way Google Voice Search optimizes for this makes it a much zippier choice.
â" â" â" â" â" â" â" â" â" â" â" â" â" â" â" â" â" â" â" -
To keep up with Quantum of Content, please subscribe to my updates on Facebook or follow me on Twitter.
No comments:
Post a Comment