Yes, I have to say it: I’m a Performance junkie; I’m trying always to the fastest and suitable solution for a particular problem in the architecture of the platforms where I’m involved. In my case, I learned a lot about caching systems, HTTP servers optimization (my favorite is Nginx right now), Databases systems tuning, and more. But everything changed when I was charged to lead the first Mobile app development in my organization.
Here in Cuba, we are in a different technology perspective of the world, because mostly of the time, we are offline of Internet. Recent changes in ETECSA (the Telecomms company in the country) services has allowed to more Cuban people to be connected more often, but the majority of the population can’t afford the service (an Internet connection card through public WI-FI costs 2 CUC per hour, the equivalent to 2 USD per hour, and it’s located in specific zones around the country). Due to this problem, you have to code every Mobile app with an offline-thinking, and we have to make it right the first time, because:
- You have only one chance to prove that your app is valuable to them
- You don’t have any form of feedback, and many of them simply can’t send it to you
So, my team and me started to getting deep about Mobile performance issues and how to fix the problems quickly.
We began to focus first in the offline version of the app, and then we began to think about to incorporate some small online features to it. In the process, we learned that there are so many problems that could affect the performance of your Mobile app, which you can feel intimidated, but many of these problems can be solved today with a detailed planing of your Mobile app development and with a completed view of all performance metrics that could affect your app. And this is the main objective of this post: to help you to identify quickly these problems, and the solutions that you could apply today, and the products and teams that could help you in the process. So, let’s start the journey.
So, you know that you have a performance problem in your Mobile app. What to do?
When anyone identify a problem, they need facts and numbers to see what’s going on. These numbers are commonly known as metrics.In the Startup world, there are lot of metrics that a CEO or a CTO must track to see the success (or failure) of a product. In this case, I will focus in the CTO, who is in charge of the technology changes and development of a company. But, if you are a Mobile-First company: Which are the metrics that I have to measure and focus to valuate the success of my Mobile app? And How I could measure all of them easily and in Near Real-Time?
First, you have know which are the metrics associated to performance in Mobile:
- App crashes
- API Latency
- End-to-End application latency
- Networks errors
To dig deeper on these metrics, I’m recommend you to read a whitepaper from AppDynamics, a leader in the space of Application Performance Monitoring, in a way that they called Unified Monitoring, which means that in a single and clean dashboard, you can track all performance problems for your servers, databases, application code, even containers; but more, they have a completely different product focused on App performance monitoring called AppDynamics Mobile Real-User Monitoring. With this particular dashboard, you could track everything related to the performance of your Mobile apps, and of course, they provide guidance how to overcome these problems, and in the long term, to increase your engagement metrics for your app, which could be translated for an increase in two of the most importance metrics for a Mobile app: Average Revenue Per User (ARPU) and Life-Time Value (LTV). But you should be wondering why I’m focused in Performance metrics, and not in these last two metrics. If you read The App Attention Span Report, released by AppDynamics too, you will find that:
Data generally shows that anything over 3–4 seconds total response time and the majority of user (60% or greater) will abandon the transaction and may even delete your app altogether
If your Mobile apps have a huge impact in the revenue of your company, and you lose 60% or greater of your users, you have a huge problem in your hands, think again about the performance of your Mobile apps, and correct all issues about it.
What about the Mobile Network?
Another enormous issue is the Mobile network. This is one of the most critical issues for Mobile apps developers, because in many times, you can’t control the speed of Mobile networks carriers, and there are a huge number of possible causes that could affect the performance of the network like location, network latency, type of device, time of day, network infrastructure and more. One of the most challenging problems here is the known issue like: Last Mile Problem, which refers to the wireless network issues from the Radio Access Network Anthenna to the device of the user, because 70 to 90% of latency problems occur there. If you think for a moment about this, an American user of your app is not in the same context of an Asian user (Mobile and broadband internet speeds are faster than U.S in many Asian countries). Even, an American user from Seattle, WA, has a very different context of a San Francisco based Mobile user. So, How you could solve this critical problem and do not enter in the red balloon of $24.5 Billion of dollars lost in the table because slow Mobile apps speed?
TwinPrime can solve your problems here
This team has created a product focused in this problem. Like they said, the secret sauce of the product is something called Global Location based Acceleration Strategies (GLAS), where they use a Machine Learning platform to capture in Real-Time the context of a particular user in form of variables (more than 1 Billion of permutations), and create an unique “acceleration strategy” for that user based in the deep analysis that the platform performs. This is a very complex problem solved for you, so you don’t have to worry about network problems for your Mobile app; they could be solved with 10 minutes, integrating the Twin Prime’s SDK to your Mobile app, and then analyze all problems in a single dashboard which presents everything righ away for you. If you want to see how GLAS works, read this, and if you want to dig deeper in Mobile performance issues, I must read this whitepaper called “A Billion Reasons For Inconsistent Mobile Performance and How To Solve for Them” from Frost & Sullivan. I think Kartik and Satish have something unique with this product, so I think this will be even more relevant with the upcoming years.
To conclude this part, I will finish with a question: What about if AppDynamics and Twin Prime join forces? They could create a major partnership to dominate completedly the market, helping to their customers to tackle all problems related to Application Performance Monitoring from almost all possible angles. If you have an idea, please let a comment or send me a Tweet with #DynamicsPrime hashtag on it.
Worried about your Mobile website? Take all advices from Google’s Ilya Grigorik
Ilia is a well known Web Performance expert employed by Google, which have given so many talk about Mobile Websites performance. One of my favorite talks is one in particular called: “Breaking the 1000 ms Time-to-Glass Mobile Barrier” which I found in the Google Ventures site. In this talk, Ilia gave a lot of incredible advices how to optimize the “critical rendering path based in several key principles like:
- Stream the HTML response to the client
- Get CSS down to the client as fast as you can
- Watch out for extra round-trips!
To make an evaluation of your Mobile site, he showed how to use Google PageSpeed Insights. Unfortunately, here in Cuba, we can’t access to any commercial product from Google (except Google Analytics), so we use Pingdown’s FPT tool for this, and the results are similar to the results provided by Google’s tool. To prove this, I analyze the performance of three major Mobile ecommerce companies: Zalando (Germany), Flipkart (India) and Lazada SG (Singapore). Here are the screenshots for them: Zalando SE
As you can see, there is always room for improvement, so I encourage to the development teams from these companies to use PageSpeed Insights or Pingdown’s FPT to make their own analysis and get all advices from there and work to reach a 100 points score. Just take another advice from Akamai: Web Performance Matters more than you think.
What about the OS base on the server?
If you ask to your System Engineering and SRE teams, you will be amazed (if they are doing a good job) about the quantity of things that you can do today to optimize an Operating System today. Many modern tech companies use Unix-based OS for this, particularly Linux distributions. With the current trends to move your infrastructure to the Cloud, a lot of companies use Amazon Web Services for it, and the company always is looking for improvements in their own Linux appliance, which they called Amazon Machine Instance. Some months ago, I talked about the last version of the AWS AMI, and why it was an incredible choice for Modern Analytics platforms. But the last version released some days ago is even more capable of everything.
The new Amazon AMI 2015.09 is an amazing piece of Engineering
This version, announced by Jeff Barr (Chief Evangelist at AWS) in the AWS blog with a guest post from Max Spevack, Development Manager for Amazon Linux AMI, is simply amazing, and two of my favorite features about it is that they inclued PostgreSQL 9.4 in their repositories (9.4.4 exactly), and this AMI use a Linux kernel 4.1.X version (4.1.7 exactly). You must wondering why this matters for Mobile Sites performance. The answer is based in two critical things: TCP Congestion Control Algorithm improvements, and Kernel Live Patching, and I will tell why this is important.
TCP Congestion Control Algorithm improvements
In the Linux kernel 3.x series, an improvement for TCP Congestion Control algorithm was introduced, and this inclusion changed the way how we can tune TCP network stack in Linux. But, what’s exactly this? The keyword here is: Slow Start, which controls the size of number of segments that are allowed to be in transit: the congestion window. This is well described in the RFC 5681, where you can read that the number of segments allowed initially is 4 in the start up of a TCP connection. After that, the congestion windows grows exponentially. So, in Real-Time apps like modern web apps; this behaviour is just hurting the performance of web sites for the low value of the Initial Window, because you have to make more round trips from the server to the client to obtain all data from the web app.
With the new kernel 4.1 series, the kernel development team introduced a lot of great new features and improvements:
- Incorporate zero pages into transparent huge pages. This improves transparent hugepage collapse rates
- Provide a “Xen PV” APIC driver to support >255 VCPUs
- 6lowpan: Add generic next header compression layer interface, add udp compression via nhc layer, add other known rfc6282 compressions
- tcp: RFC7413 option support for Fast Open client and server
- tcp: add TCP_CC_INFO socket option to get flow information from Congestion Control modules
Live Kernel Patching
This is other of the most radical changes which will redefine the use of Linux as a server. The live kernel patching project has been an effort of many teams together with a single objective in mind (the power of Open Source again!!!): to create a new mechanism for LIVE patching function of the kernel, which could be translated as like you could upgrade your kernel in your server without to shutdown the server to get ready to use the new kernel. This is important for critical applications hosted in the Cloud, 100 ms of downtime is equal to 1% loss of sales, so this is huge for Cloud based critical apps. To read more about this, I let you this post from Josh Poimboeuf, Senior Software Engineer at Red Hat, and one of the main developers of kpatch. Again, is you are interested to monitor your servers, you can use AppDynamics Server Monitoring, or AppDynamics AWS if you are using AWS.
If you use AWS, every talk from Brendan Gregg about Linux performance on EC2 is a must-see
If you are make tuning in your Cloud servers, please read everything related to this tricky topic in the blog of Brendan Gregg, Senior Performance Architect at Netflix, particularly these two talks:
If you like to tune your Linux servers to get from them the maximum possible performance, you must see these talks from Brendan, and if you have any question, he is very open, so don’t hesitate to ask, but do it after your deep research: Ask smart questions.
What about the Database Storage layer?
If you use PostgreSQL (like me) in your own Data Center: Use Pure Storage’s FlashArray //m
If you are a System Engineer and you have dealed with large scale systems in a Data Center, you should have used at least one time, an array of discs. In my case, I’ve used a lot this, and many times, I’m seeing to myself trying to tuning everything related to the base OS to take advantage of everything, but many times I have found that one of the main root causes of a particular issue in my Database systems is due to the SAN/NAS systems, mainly caused for the speed of the discs, ore many times because they are not ready to provide a fast performance for a large quantity of writes and reads, more commonly known like I/O operations; they are not ready for Real-Time Data Ingestion applications.I began to wonder to myself:
“C’mon, we need to find a suitable and cost-effective alternative to this problem”
I began to focus the research in recent innovation in the Storage layer, and I found Pure Storage. But I found more: Pure Storage was compatible with PostgreSQL when I read how Yodle migrated their Storage layer to them, and achieved 5-to-1 data reduction of the database and the execution time for all database queries was dramatically decreased You should be wondering why I’m putting this here. The answer is simple: the Database, it doesn’t matter if you use Apache Cassandra, PostgreSQL or MongoDB, the DB is always a critical point of your Mobile App, because you store everything there, so when you are optimizing your infrastructure to deliver the best service to your users, you must take a deep to your database, and of course to your Storage layer. To see everything in your Database servers, AppDynamics has released a new product called Database Performance Management, where you can analyze locks, execution plans, slow database response times and more.
BTW, they are looking for a Product Manager for this particular product, so if you are interested, I let you the link here.
What about the HTTP and App Backend layers?
Many companies right now are using the MEAN Stack (MongoDB, ExpressJS, AngularJS and NodeJS) for Mobile Apps backend development; and for the frontend they use Nginx. So, you need to see what happens in these layers too. For example, for Nginx + NodeJS tuning, I found this great post from the GoSquared’s Engineering team where they explained brilliantly how to do this. One of the hardest problems to tackle when you are using NodeJS for development is the memory leaks that you app could create. To give you ideas how to solve this issue, you can read these two posts:
- Debugging Memory Leaks in Node.js Applications, from Vladyslav Millier
- Understanding Node.js Memory Leaks, by Omed Habib, Principal Product Manager at AppDynamics
If you have a performance problem with your NodeJS application, talk with Omer and his team, they are ready to help you.
So, when you are in the privileged place to develop a Mobile app direct to consumers, you have to take a deep look for its performance, and with AppDynamics, Twin Prime and Pure Storage, you can work hard to deliver the best service for your users. Mobile Performance could be a pain in the ass, and can hurt hardly your revenue metrics, so keep a deep eye on it and work hard to eliminate all issues related to it. Thanks for your time and let me know what do you think about it.