Learn - File-Based Media Workflow Solutions
What to Look for in an Enterprise Transcoding Platform
editors note: This article was originally published as a White Paper by Kirk on his companies website, however I found the information so useful, I felt it was important to include it here as well. A. Beach, Comrade-in-Chief
By Kirk Marple, RADIANTGRID TECHNOLOGIES, LLC.
Transcoding has come a long way from its new media origins, especially in broadcast and content production applications. These days, media organizations must handle more formats across more platforms with fewer resources than ever before. Managing these types of workflows requires transcoding solutions that offer high levels of performance, reliability, efficiency, and quality. This article attempts to detail these factors further, so that readers can make an informed decision about the transcoding platform that will best fit their company’s needs.
Introduction
Transcoding, has quickly become a common term in the broadcast industry. For many, it exists as a commodity; something they can plug into their media workflows when needed. Indeed, the technology to transcode or convert media formats can be found within many software products on the market today. It can also be found for free or open-source products such as HandBrake or FFmpeg. Even an inexperienced software developer can script the conversion from one media format to another, such as MPEG-2 to Windows Media. In this context, it is not inaccurate to view transcoding as a commodity product.
However, as with any core technology, the surrounding applications drive the complexities of the technology. When the critical demands of the business present complex technical requirements with the need to handle efficiencies of scale with a lower labor force, we need to change our view of transcoding and its role in the workflow. Such a scenario is currently playing out within the broadcast industry. This article will examine this situation thoroughly, describing the role transcoding plays in organizations generating content today, including how new file formats such as the AMWA’s MXF AS-02 and MXF AS-03 fit into this process. In this article we will cover several topics, including transcoding in file-based workflows, format management within transcoding, media transformation and automation, and quality factors. While there are many other points worth covering about this technology, we believe the above are the most critical to consider when selecting a transcoding platform.
A Brief History
In order to understand the present state of transcoding and its evolution within the broadcast industry, it is useful to review its past. The term “new media” emerged in the late 1990s, as people sought to classify the use of video and audio content within the Internet medium. Due to the smaller size of audio files, technology on this side of the new media revolution progressed very quickly. Technology development came much slower for video, as the large file sizes involved encumbered their use on the Internet and within the media workflow itself. Gradually, however, codecs emerged to optimize video for online delivery, and so did the technology to allow for transcoding between media formats. At this time, High Definition (HD) video was in its infancy, and there was a lower bar for video quality compared to the compressed video size, or just being able to distribute compressed video at all.
As HD video technology advanced over the years, media content providers were not keen to see their expensive productions made available on the Internet, a medium that was primarily free and sacrificed quality. In response, many carved out new media divisions within their organizations, often assigning them their own, separate production goals. Transcoding became a dominant player in these new entities, giving rise to an enterprise-class software transcoding industry.
Once content providers began to make use of Internet distribution methods, they helped push forward more rigid broadcast quality and regulation requirements. Transcoding solutions unable to adapt to these specifications and protocols had a difficult time transitioning into the new market environment. Those that made the cut were platforms capable of extending beyond common codec conversion to embrace the flexible, performance-oriented architectures necessary to support broadcast-centric formats. They could also support HD, closed captioning and loudness standards, as well as uphold broadcast-quality standards.
The transcoding platforms that meet these guidelines are those used by today’s broadcasters, one example of which is the AMWA’s new AS-03 file format. They are counted on by multi-division operations with 24x7 on-air requirements — multi-million-dollar production environments supported by advertising dollars. As for your organization, the next few sections will outline how to find the right solution for its unique requirements.
Transcoding in File-Based Workflows
While the process for building a file-based workflow is typically thought of as nonlinear, there are a series of steps that should be followed in a linear manner to ensure successful results. We define these steps as ingestion, indexing, quality control (pre-transcode), transcoding, quality control (post-transcode), publishing, distribution and notification. When followed in the proper order, they help prepare content for various distribution formats, including online, video on demand (VOD) and cable. Depending on the different transcoding processes performed, this workflow can employ a combination of both hardware- and software-based technologies.
When analyzing potential workflows for your organization, key elements to consider are the turnaround time, volume of files being managed, hardware involved and amount of staff available to help with the process. We will walk through each of the steps involved to better understand their requirements and benefits:
Ingestion
All content must be brought into the workflow whether through satellite transmission, stream-based capture, editing workstations or physical media such as a videotape, CD or DVD. When selecting transcoding software for a file-based workflow, seek out one that can handle the majority, if not all, of the above ingestion formats. During ingestion, files are pre-processed into a form that is optimal for the transcoding stage. This can mean breaking down or de-muxing files into their essence formats. From this, mezzanine streams can be made from the source media, which are optimal for transcoding.
Indexing
While ingestion creates the assets in the repository, indexing creates the metadata that is attached to those assets. Via indexing, all media-specific metadata (i.e. file length, frame rate, codec, etc.) is pulled out. As metadata is major component of file-based communications, the optimal software solutions are those that easily manage the data and allow for changes to be made quickly. Many software platforms offer catalog management, which allows for all items associated with a file (a thumbnail view, a preview version, the master) to be packaged for delivery with metadata throughout the workflow. Catalog management also includes the ability to search by any of these connected items.
Quality Control
Once metadata has been indexed from ingested assets, quality control policies can be applied to flag content that doesn’t match expected criteria. For example, this can flag assets with missing metadata, undesirable audio loudness levels, color bars or black slugs, or unexpected codec parameters.
Quality control can be handled by external software applications, which are integrated via web services or simply by watch folder hand-off, or be internal software toolkits which are tightly integrated with the media workflow services platform.
Transcoding
All audio-, image-, closed captioning- and subtitle-processing happens in the transcoding phase. Transcoding can take place completely within the software platform or via a combination of software and external hardware. Many transcoding software platforms now include audio and video processing, allowing multiple transcoding processes to take place simultaneously. This can speed up the process considerably, particularly when working with a large amount of files.
For those looking to use a combination of hardware and software processors, transwrapping (also called transmuxing) is another option. With transwrapping the source file is ingested and the video and audio are de-muxed into essence streams. This allows either the video or the audio to be processed within the platform, then muxed back together, so that the transwrapped file can be processed using hardware processors.
The best transcoding software solutions offer several important features. Not only can they handle typical formats such as MPEG-PS or MPEG-TS, but all broadcast formats, such as MXF, GXF, LXF and Omneon as well. Many transwrapping software options also have the ability to manipulate audio data, including closed captioning and ancillary data, performing processes such as inserting a caption stream and upconverting one caption format to another, such as the SMPTE 436M closed captioning format. Other capabilities include the ability to remap channels, perform loudness correction, and encode into different formats.
Some transcoding software platforms will also handle the assembly of one or more assets into the final product for distribution. This feature can be employed, for example, to stitch a black slug with a promotional interstitial, the master asset (movie, TV show, etc.), trailing interstitial and trailing black slug. Some developers also offer multi-track assembly, enabling the software to act as a basic nonlinear editor for putting together different takes of the same project.
Another factor to consider is the actual transcoding process the software uses, and how it distributes content throughout the server. Some software distributes transcoding tasks across the transcoding farm as capacity becomes available. Though effective, this can limit the speed at which a file can be converted. Another option is grid transcoding, which allows source content to be transcoded in parallel across all available transcoding resources, and can speed up the transcoding process.
Publishing
The publishing step is where the transcoded files and metadata are taken into the repository and packaged for delivery. Publishing doesn’t touch the actual media file, but it may put the files into a special directory structure or rename them so they are properly assembled for output.
Distribution
The next step of the process is distribution, which takes the transcoded files, and possibly the metadata, and pushes them out to a file server. The files can then be posted to a website, sent to an online content provider, made available to an online music download vehicle such as iTunes for distribution and purchase, or to an online video service such as Hulu. With file formats such as the AS-03, a third layer of distribution is on the forefront. Instead of sending programming to their affiliates via satellite in real time, more and more networks are utilizing file-based delivery, where content is transmitted in a packaged file format and sent via satellite prior to the actual broadcast. While file-based delivery offers the benefit of preventing satellite issues from affecting the actual broadcast of a program, this new delivery format makes transcoding more critical to operations, as this process will be used to package files for distribution.
Notification
Even though there is a lot of software involved in a file-based workflow, nothing ever happens in a vacuum. Notification, the final stage of a file-based workflow, can be handled by a human or an automated process. This can simply be a matter of sending an email or a message via a Web service or notification system within the software platform that informs the final user of the files’ presence. The best solution for your facility depends on the amount of people you anticipate utilizing the files and the user’s general proximity to one another. .
Creating a file-based workflow combines old practices with new ones. Software allows much of the process to be automated, but there will always be a human element required for the workflow. For instance, dropping a file out of one piece of software and loading up another piece of software, then transcoding it and putting it in another folder will need a human hand. Having a better understanding of the various elements involved in a file-based workflow can only lead to better results and the ability to better manage the content.
Format Management within Transcoding
Most file-based formats in broadcast applications represent multiple layers. This includes the video codec and essence, audio codec and essence, a container and, when applicable, any of the ancillary user and closed caption data. Any complete transcoding solution must be able to handle all of these elements while maintaining the flexibility to adapt to and manage the changes as formats evolve. Broadcasters today are seeking to make formats more intelligent, often by describing the information of the video within a container structure so that the file is easily moved along the production workflow pipeline. MXF (Material eXchange Format) is an excellent example of retaining this intelligence within the file. An extension of the MXF file format is AMWA’s AS-03 file.
The AS-03 format was developed in a partnership between AMWA and PBS as a vendor-neutral subset of the MXF file format to be used for the delivery of finished programming from producers and distributors to major networks and their affiliates. These files were designed so they could be delivered in their entirety to be cached before playout. AS-03 files contain defined sets of metadata for identification and verification of content versus program traffic metadata that is delivered separately.
The specification of AS-03 files can be further limited by a “shim,” which provides a set of constraints reducing the range of variability that may be needed to define certain applications. The shims create categories that address a particular type of programming or genre. These can also refer to particular requirements of a station or station groups, such as bit rate, aspect ratio and sound essence schemes. A newer file format, AS-03, is seeing growing adoption by many major broadcasters, but only a few transcoding software manufacturers are currently offering AS-03 support. Still, it should be considered if you plan to move to file-based delivery in your facility.
When using transcoding software, AS-03 MXF files are prepared by transforming the video essence data into MPEG-2, the audio essence data into PCM and AC3 and ancillary (closed caption) data into SMPTE 334M and SMPTE 436M VANC structures, which are muxed into the AS-03 MXF container format.
Transwrapping comes into play here not only because it helps prepare the file but, given the standard MPEG-2 or H.264 essence stream in the AS-03 file, allows broadcasters to repurpose these files for other playout servers or audio decoding hardware without taking the time to re-encode the video. For example, if the AS-03 file comes to the broadcaster using PCM audio, it can be transwrapped to use Dolby E encoded audio, or another audio format that fits best within one’s workflow.
Some questions to ask when evaluating transcoding software:
-
How does the transcoding software use the implicit and explicit metadata stored in the file?
-
How does the transcoding software deal with decoding and encoding the variety of industry formats?
-
Does the transcoding software support faster-than-real-time transcoding via grid processing?
-
Does the transcoding software provide transwrapping capabilities, allowing audio or video to be processed without having to re-encode the file?
The transcoding platform should be able to intelligently index files and leverage this information to make processing decisions during the transcode operation. For example, knowing the aspect ratio of the source file allows the creation of the proper output video frame size. If the transcoding software cannot adapt to this incoming data, then the organization will require constant visual quality control in the case that the source file is different than what the transcoding software was expecting.
It is important to investigate how that transcoding software deals with the formats in the muxing and transcoding stages. Does the software do pre-processing, or are aspects of the format handled only in a post-processing stage? For example, the transcoding software should utilize integrated algorithms or software libraries for optimal handling of closed captioning data or audio formats such as Dolby E. For some situations, the software may need to delegate to an external application through Web services integration, which affects the overall processing time.
To support faster-than-real-time grid processes, the transcoding software requires a proper essence management process, via muxing and demuxing, as well as low-level knowledge of the formats being gridded. Some transcoding software is marketed as being gridded or faster than real-time. Often this means the software can handle multiple files being processed concurrently in a batch mode across multiple CPU and multiple servers. This is not a truly gridded processing model, which can scale the processing speed of individual transcodes as more CPU resources are applied. Transcoding software that can optimize performance, on a per-file basis, does so by slicing the source media at various durations, meeting production turnaround times driven by baseband linear process expectations. Being able to do this on a per-file basis provides more flexibility over the control of the workflow.
In addition, grid transcoding provides better utilization of the transcoding farm, when transcoding content of mixed lengths. Rather than waiting for a single hour-long movie to finish while the rest of the farm sits idle, grid transcoding enables the source content to be load-balanced across the entire farm, providing the best efficiency and speed possible.
Media Transformation and Automation
As staff decreases while production requirements are increasing, broadcasters exceeding the needs of typical file-to-file transcoding and are moving into the areas of clip concatenation, track-based media assembly and composition, frame-accurate video and audio mixing and such track-based video and audio effects as dissolves, cuts and fades. In response, developers are starting to offer transcoding solutions with capabilities previously available only in non-linear editing systems. These capabilities are now possible during the actual transcoding process, not only in pre-processing. We have termed this level of automated processing multitrack assembly.
Multitrack assembly actions are defined by workflow templates, which are applied to incoming content in order to render a finished, composited, mixed output file without taking up the valuable time of a video editing workstation. Media composition works seamlessly within a distributed architecture, with capacity scaling in tandem with the expanding compute resources. The multitrack assembly engine supports such transformative elements as the stitching together of multiple whole or partial media assets without using intermediate files. An example of this would be to stitch NTSC color bars, a promotional interstitial, the master asset (movie, TV show, etc.), a trailing interstitial and a trailing black slug. During transcode, the source video frames are rescaled and/or cropped into the output video stream, and the source video adapted to the output frame rate by dropping or duplicating frames. These elements can be combined even if the original sources were not the same.
Another use of multitrack assembly is to support multiple composited video tracks, which can use 32-bit alpha channels or opacity assignment. Each track can be sourced from a video or image asset, or be generated dynamically from a graphics or text-rendering engine. Assets can be assembled into multiple tracks at any point in time and duration with no requirement for end-to-end assembly. These video tracks can be assembled frame-by-frame, compositing the frames together based on their opacity parameters, and then transcoded into the appropriate output codec.
The multi-track assembly model supports multiple mono audio tracks, which can be sourced from any channel within one or more audio assets. The final assembly can contain, for example, tracks Lt and Rt from an English language audio asset and tracks Lt and Rt from a Spanish language audio asset, which get transcoded into a four-channel audio stream in the output file.
All of these aspects build on efficiencies, which helps users handle much larger volumes of content across more environments with fewer resources. With the introduction of publishing tools into the media workflow solution, transcoding software can combine the metadata of the asset with dynamically-assigned metadata, and then publish the asset and its metadata in a format for use with specific affiliates or retailers. A robust solution would support publishing plug-ins used to template the publishing requirements for a particular outlet, such as iTunes or a cable affiliate. These publishing requirements are limited not just to the metadata, but can also include such related assets as artwork. This enables a full media package to be published to the destination in a fully automated fashion.
The combination of all these processes (format management, track and program level editing templates, the publishing of metadata and related assets and the definition of the destination profiles), gives users access to a full media automation platform that allows for the scheduling of release dates ahead of time. Not all transcoding software solutions support this deep integration between workflow stages, and therefore cannot offer the fully integrated solution described here.
The Quality Factor
Quality is prized by everyone in an organization, from the junior engineer to the top executive. When content is monetized, quality is something that should rarely be sacrificed. Yet the constant pressure to deliver content to more outlets forces many organizations to take technical shortcuts, resulting in the sacrifice of quality for volume. As the old project management paradigm suggests, when evaluating cost, quality and performance, you can only pick two. The best media workflow solutions can offer customers all three.
Some key aspects to consider when evaluating any transcoding platform are pre-processing and post-processing functions. Before starting the transcoding process, the best software will check the file to make sure that all elements of the ingested file are correct, such as the frame size and the bitrate, to prevent errors downstream in the workflow. Quality control can be performed by an external application that can validate the quality control policies set for the project, or by an integrated solution which is seamless with your workflow solution. After transcoding has been successfully completed, it doesn’t always guarantee a conforming file. Similar to the quality control pre-transcode, policy validation can also be performed as post-process on the transcoded file.
Another way to ensure output quality is to apply filtering and enhancement during transcoding on the audio or video samples. The best transcoding solutions incorporate tools for sharpening the video, reducing the noise, or removing telecine artifacts, which optimally can be performed during the transcoding process.
For many broadcasters, audio loudness levels present a major challenge, especially with the impending CALM legislation, which will require them to stay within certain volume levels. In the software domain, there are few transcoding solutions that conform to industry standards such as ITU-R BS.1770. Those that do conform provide for loudness correction during the transcoding process, avoiding the need to pre-process the audio or run it through a hardware device in a hybrid approach. Transwrapping can be used in this case to pass through video streams of any type, re-encoding, applying loudness processing, upmixing or downmixing the audio sources while leaving the video intact.
Conclusion
In summary, transcoding for the broadcast and content production has become highly sophisticated since the early days of “new media.” Organizations facing the challenge of managing more formats for delivery to more outlets across more platforms – all with fewer personnel and other resources –must choose transcoding platforms offering high performance, reliability, efficiency and quality. All of these factors should be taken into account when selecting a transcoding platform for your organization. Cost is obviously a major element, but it needs to be balanced against the rich set of features offered by the right platform. This is the only way the solution that will work for the long run, not just the short term.
