File size vs. archive: The race is on

File size vs. archive: The race is on
Chris Luther, SGL
Opinion
June 22nd 2016 at 3:05PM : By

The ballooning size of files makes the speedy accessing of media increasingly difficult. Chris Luther, SGL director of professional services, offers a solution

Video file size has always been one of the most challenging elements of a media centric file-based workflow. Part of that challenge is consumer demand for the latest formats such as HD and UHD, as well as industry demand for super slow-motion, high frame rates and uncompressed raw video file formats. This leads to a never-ending race between file sizes and the available technology to store these files.

A single hour of uncompressed 4K video uses over four terabytes of disk space, and until quite recently you couldn’t fit a file of this size on a single physical hard disk. Even if you could move this file over the network at 800 megabytes per second, a stretch even for 10 gigabit networking and the highest-end production storage, an editor would still have to wait 95 minutes for the full hour long file to transfer. A way around this issue is to employ a proxy workflow using a content management system that can take advantage of partial file restore (PFR).

When content is ingested a low resolution proxy file is created, an hour of proxy material at 36 megabits is 15 gigabytes. The 36 megabit high-end proxy provides creatives with the ability to make accurate decisions about the direction of the project. At the time of ingest the same full resolution file can be sent to the deep archive.

The deep archive system must be media aware to capture information from the media files to enable partial restore; many video file types provide streaming and random access functionality during recording. A new challenge has arisen when restoring parts of these formats from a linear medium in a media archive. This is because the all-important index (or frame lookup table) that provides file length information, etc., which is normally found at the header of the file, has to be placed at the end. There is a simple reason for this, when the file is being written to, the length is continually growing, therefore the index cannot be added to the file until the recording has been completed. This is what is meant by a low latency format file, which can be streamed to an editor or playout system before the end of the recording.

A single hour of uncompressed 4K video uses over four terabytes of disk space

Once the file is moved to an archive and written to a linear data tape (like LTO), in order to restore part of the clip the beginning of the file has to be found. When no index information can be located at the start of the file, the data tape has to be read (at a slower read speed) until the end of the file, where the index is stored (on the basis that there is no other way to calculate the size of the clip). Once the index is retrieved, the archive management system then has to rewind the tape to the start of the clip again and move forwards (at seek rate) to the start of the portion to be restored. From there the system can begin the process of restoring the content from the archive. Even with devices capable of random access, like a SONY ODA cartridge, the goal is to avoid busying the expensive drives and network with unnecessary data movement.

Another way to do this is to restore the whole clip from data tape to a random access cache and then restore the desired part of the clip and rebuild the file’s metadata. Either process results in lengthy delays in getting what could be small segments of a larger clip.

The solution is to use a content management system that is capable of understanding the media as soon as it starts the archive process. This awareness happens long before any user is likely to want to restore certain sections of the video. Built in intelligence within the system’s data handling components read and log the metadata held in the MXF or QuickTime wrapper as it’s written to the archive. This information is then passed to the system’s database for later use when only a small portion of the original material is required. In addition to using this metadata to locate the required section in the archived file, it is used during restoration to create a new fully standards compliant media file.

By capturing essential metadata during archive, the content management system avoids having to move or read the full media file from the archive media. The system only transfers what is needed to the destination device saving both bandwidth and space on the expensive production storage.

SGL’s Avid Interplay Web Services plug-in with Partial File Restore, powered by Glookast, enables editors to easily select and restore elements of a clip directly from the archive in high resolution. Prior to this, partial file restore functionality could only be achieved using the full Avid Interplay | Archive system.