The challenges of building HTML5 CTV applications and how to solve them

Co-Founder and Managing Director Matt Duhig explains the challenges posed by building CTV apps with HTML5 and why WebGL is the future of TV app development.
 
by Matthew,  24th August 2021
Connected TV

Connected TV is a particularly fragmented space. Not only do devices come in all shapes and sizes, but they also each have differing hardware and software. This variance ultimately makes it incredibly challenging to develop an application that performs and behaves (and I mean behaves) consistently across each of them.

Over the past years, there have been many different approaches to building an application for Connected TV devices. Some have preferred to use a single codebase and run a web based application across all devices, while others have opted to make use of four distinct codebases to create independent applications for Roku, Apple TV, Android TV, and the web based devices (Samsung Tizen, LG WebOS, XBox, Playstation, STBs etc.). In almost every instance, it’s largely agreed that when building for those web based devices, it’s best to use a web application approach, meaning you use a single codebase for each of the devices.

This single codebase approach has typically been to use a HTML5 application written primarily in Javascript. However, while frameworks such as React are great, it’s not been possible to use them on TV devices, which are often dated and limited in their ability to render React applications in a performant manner. Herein lies the problem: for years the world has been subjected to poorly performing TV applications on some pretty major devices and it’s time for a change.

Learn more about our approach and how it can help you.Contact us now

The Challenges

So why is it so difficult to build an HTML5 based web application that will run in a performant manner on devices? Well before we look at the application, we must first look at the devices.

The challenges posed by devices

There are many devices available that fall under the Connected TV moniker – Smart TV’s such as Samsung Tizen and LG WebOS devices, consoles such as Playstation 4 and XBox One, streaming devices such as Amazon Fire TV, Android TV and Apple TV, and Set-Top-Boxes (STBs) such as Comcast and SkyQ. As you can imagine each of these devices is built to an entirely different specification, and while some, such as the XBox 360, are quite punchy, others such as the lower end STBs, will barely render an image.

When building applications for Connected TV, we need to consider all of these devices and ensure that we are developing something that caters for the weakest link and gives a great consistent user experience across each device. When working with these devices, it helps to have an understanding as to their constraints, and it comes down to a few things including hardware, progression and software.

Poor device hardware

Smart TV and STB devices may do some pretty impressive things these days, such as allowing the user to control their device with their voice, however the hardware being used by the devices is still very limited. Not only this, but in most cases vendors will limit the amount of resources they permit application developers to use. For example a console device may have 4GB of RAM, but don’t be surprised if the vendor has placed restrictions on RAM leaving streaming applications with 1GB while the rest is reserved for the OS and multitasking games.

Free RAM for your application is just one area of concern, as it’s pretty common for these devices to also be loaded with relatively limited processors. These limited processes mean that lagging around a UI is something that’s quite often experienced on applications, as the processor struggles to handle many requests at once (especially if the CPU has a low number of threads).

Slow device progression

Technology moves fast you say? Well not on Connected TV devices I’m afraid. Consumers do not tend to upgrade their TV devices often, with some experts saying that they only upgrade once every 7-8 years. While I believe that consumers update their TVs more often, I don’t believe that the hardware in these devices progresses at the same rate as it does in a laptop or tablet device, especially when we consider the low and mid range of TV devices. As hardware becomes cheaper, vendors will ultimately opt to spend less on hardware and increase their margins, not keep the same spend and benefit from better hardware.

Legacy device software

In order to be able to run web applications, devices must have a browser pre-installed onto them and these are implemented onto devices when they are manufactured and the software is installed. Browsers that are used by these devices will rarely be your latest and greatest and in most instances they’ll be a fork of a browser that may well have been modern when the TV was being created, but that has since become outdated and not been updated in any software updates since by the vendors. What this ultimately means is that developers of Connected TV web applications find themselves working with all sorts of browsers.

At FX we’ve come across the likes of Chrome 39 (we’re now on version 92 as of writing this article), Opera 36 (circa 2016), Safari 6.1.6 (circa 2013) and QtWebkit (a Safari 5 fork from 2010). What this means to developers is that many of the latest features of browsers will not be usable on Connected TV devices, and in some instances they’ll find themselves having to brush up on their ES5 Javascript and avoid modern syntax such as arrow functions and promises. When building a web application that needs to work across many of these different browsers, you can start to see how difficult it can be to deliver a consistent experience to the end user.

Create high performing CTV apps to give your customer the best user experience.Contact us now

The challenges posed by applications

Looking beyond the devices themselves, the methods used to build a web based TV application can have a considerable impact on performance. Traditionally the favoured approach has been to use HTML5, with frameworks such as BBC TAL being developed to try to make the job easier. However, using HTML5 can have significant drawbacks on what are very low specification devices.

The weight of the DOM

The DOM is a cross-platform and language-independent interface that treats an XML or HTML document as a tree structure wherein each node is an object representing a part of the document. If you ever view the source of your web page, it’s the visible set of nodes (or elements) that you’ll be presented with. The DOM is great for web development, it gives developers a clean way of interpreting code whilst also allowing for them to make dynamic modifications to it and therefore the interface using Javascript (such as changes in positioning of elements).

However, while it helps to improve developer experience it does have some drawbacks, particularly on those low spec devices. There’s very little we can gather about memory and performance when working with the DOM, and DOM rendering is relatively slow when compared to other approaches. As you add more DOM nodes to a document (page) this consumes more and more memory, so often we find ourselves avoiding overly nested DOM trees or long pages, and have to implement techniques such as windowing to negate the effects caused by an overload in elements. As you may have guessed, without some very stringent rules on how HTML5 applications are architected and actively developed, using the DOM to build them can quickly become cumbersome.

The efficiency of CPU management

When using a TV application there’s plenty that could be going on at any one time. While the application is loaded, you could be navigating around the screen, initiating various animations, whilst analytics tracking requests are being made under the hood. When you reach a video, you could be playing this back whilst interacting with the controls overlaid. Whenever multiple processes are being run at once, CPUs need to make use of multiple threads to carry out what’s required.

As we’ve already shared, TV devices typically have poor CPUs, and the more multitasking we do, the more this becomes apparent. This is often the cause for any laggy rendering you may experience when using an application, say for example when you try to use the player UI during playback of a video asset. As a result of this, when building HTML5 web applications we have to be very selective when using animations or attempting to do anything particularly complex while another process is running. To give you an idea as to just how tricky this can get, I’ve experienced instances in the past in which the buffer wheel for a player will show when the video is genuinely buffering, but then cause issues with CPU processing as the wheel is animating whilst the video is playing.

Imagery and styling

First and foremost, in any web application your building it’s important to really crunch down the size of the images that are being used. Enormous images (in size and or weight) will only cause you pain. If you’ve ever pressed a directional arrow on a TV remote to navigate down causing the screen to move all of the elements, and been frustrated at how long it’s taken to repaint the screen, this is likely due to the browser struggling with an enormous image (or images) on display.

We often use cropping tools and lossless compression to help here, but there are other more advanced things we may like to do with imagery to bring some funk to our application. A good example of this extra flair may be to add an opacity to an image using CSS.

Unfortunately, for all of the aforementioned reasons, TV devices will struggle to manipulate images in this way. A lot of the time we find ourselves asking designers to modify images to add opacity before sending them down to us, as it allows us to create much more performant applications.

Animation

When TV devices struggle to render your application, animation is often the first thing developers decide to remove. This is a real shame, as animation often brings a degree of ‘polish’ to an application that can set it apart from rivals. Animations require a degree of CPU performance in order to be able to run smoothly on any device, and as we know by now TVs are not blessed with this.

To try to combat this we can use a combination of the aforementioned techniques such as windowing, and also use particular CSS transitions that will GPU accelerate the animation. Alternatively, we can try to be clever with our use of Gifs over CSS, this can be particularly useful in helping us to manage CPU threads when trying to render multiple animations at any one time.

All in all though, it takes considerable effort to achieve acceptable animation on TV devices, and is common practice to deactivate them entirely on lower-end devices that will struggle to render them in an acceptable manner.

Avoid these common frustrations by working with FX Digital.Contact us now

The solution: WebGL

I have summarised the major challenges when building a HTML5 application for a Connected TV device and in a lot of instances there are some workarounds or alternatives that I have suggested which could help to maximise the capabilities of an application. However, in order to really push the boundaries of a Connected TV device and achieve significant performance gains, we need to forget traditional concepts of web development and instead look towards something else; WebGL.

WebGL is a JavaScript API for rendering interactive 2D and 3D graphics within any compatible web browser without the use of plug-ins. WebGL is fully integrated with other web standards, allowing GPU-accelerated usage of physics and image processing and effects as part of the web page canvas. In essence, WebGL allows us access to something that we would not otherwise be able to work with on Connected TV; the GPU. This significantly improves the rendering of an application, and ultimately leads to a considerable reduction in memory consumption. Furthermore, we suddenly only need the use of a single DOM node; to render the canvas in which WebGL will work it’s magic.

Unfortunately for developers, WebGL isn’t one of the easier Javascript APIs to work with. Fortunately for developers, there’s an open source framework called Lightning that makes it all that little bit better. Lightning abstracts the complexities of WebGL and provides a useful API designed with Connected TV in mind. When we use Lightning on Connected TV devices suddenly we’re able to begin pushing the boundaries.

Animations are not only possible thanks to the unlocking of the GPU, but we can also now get even more creative with them thanks to WebGL shaders. Having access to this GPU also means that the pressures previously placed entirely on the CPU can be shared with the GPU, leading to much greater multitasking and rendering all round.

Going one step further than this, memory consumption is also considerably less due to the significant reduction in DOM nodes, with the browser needing simply to only work with a single canvas element.

When it then comes to consistency between devices, we find that using the WebGL approach brings significant benefits too. With one single canvas element no longer do we have issues with each browser rendering HTML elements and styles differently to one another, everything is interpreted by the browser and drawn onto the canvas in a very predictable manner.

The net result of all of these benefits is a much more performant and much more predictable application. Arguably one downside of using this method is the need for developers to understand an almost entirely different workflow. Onboarding and training can be more complex as using WebGL requires a completely different approach to development. It’s much more difficult to interpret what’s happening in the browser when there is only a single canvas element in your dev tool, for example. However, there are many tools to help with this and once developers are onboarded and things are underway, we often find processes become much more efficient.

Utilise the power of WebGL for your Connected TV app.Contact us now

Conclusion: WebGL is the way forward

At FX Digital we have developed our technology to allow us to build our applications into Lightning. Using a single codebase, this means that we can deploy to native across Apple TV and Android TV based devices, and deploy to all other web based platforms using the Lightning WebGL renderer.

Not only does this give us significant device reach, but it also gives us incredibly performant applications across not only the high end connected TV devices, but also the lower end devices too.

While we still take performance very seriously during development, we now have much more available resource capacity in the devices we work with to really push the boundaries of what’s possible on Connected TV. The consistency this approach brings, means that we can therefore create exceptional applications that look and perform incredibly across a significant number of devices.