Skip to content

Image Super-Resolution: DAT (Dual Aggregation Transformer) the new goat for Pythonistas? 🐍

As a Python aficionado, I sought to leverage my programming acumen to enhance cherished images. Armed with the powerful Pillow library, I embarked on this mission, only to be met with disappointment 😑 when the results from the conventional resize method yielded subpar quality – a far cry from the vibrant images I envisioned. In this article, I am excited to unveil a groundbreaking solution. Say goodbye to lackluster results and hello to superior image quality! 🫨

DAT light VS Bicubic

🤔 Why DAT?

Image super resolution and bicubic interpolation differ in their methods for improving image quality. Bicubic interpolation is a simple algorithm that adjusts pixel values to resize images, often leading to blurry outcomes when enlarging. Conversely, image super resolution, powered by deep learning models like CNNs, aims to create high-resolution images with enhanced details by learning from high-quality data.

DAT stands as one of the foremost image super-resolution algorithms. Presently, accessible and user-friendly open-source alternatives encompass:

These options necessitate an initial setup, like downloading weights. For ESRGAN, some code tinkering is needed to make it programmatically usable. Additionally, these alternatives are becoming dated — EDSR was released in 2017 and ESRGAN in 2018. The field of research in image super resolution has since progressed.

Image Super-Resolution on Set14 - 4x upscaling
Image Super-Resolution on Set14 - 4x upscaling (paperswithcode.com)

As machine learning engineers, data scientists, or even Python developers, we always seek the best, newest, most versatile, user-friendly, and fastest models, don't we? So, let's kick things off with DAT! 😎

🚀 How to use DAT?

Quickstart: upscale

For the installation according to your dependency manager:

PyPI
pip install pillow-dat
Poetry
poetry add pillow-dat

For the usage:

example.py
from PIL.Image import open

from PIL_DAT.Image import upscale

lumine_image = open(".github/lumine.png")
lumine_image = upscale(lumine_image, 2)

As you can see, DAT is usable through pillow-dat, a Pillow library extension. It provides an upscale of times 2, 3 and 4. But the best of all, it's just one line of code 😎.

Advanced: custom models

The library offers four versions of DAT models for advanced programmers. Here's an example for a scaling factor of 4:

  • DAT light with 573K parameters
  • DAT S with 11.21M parameters
  • DAT 2 with 11.21M parameters
  • DAT with 14.80M parameters

By default, the method PIL_DAT.Image.upscale utilizes the embarked DAT light models. However, if you're feeling adventurous or require even higher image quality, you can access these advanced versions using custom models:

example_custom_model.py
from PIL.Image import open

from PIL_DAT.dat_s import DATS

lumine_image = open(".github/lumine.png")
model = DATS("./dist/DAT_S_x4.pth", 4)  # Instantiate a reusable custom model instance
lumine_image = model.upscale(lumine_image)
lumine_image.show()

Please note that model weights in *.pth format are accessible via a Google Drive link provided on GitHub or PyPI.

By default, when you use the PIL_DAT.Image.upscale method, it loads the model, performs the upscaling, and clears the model from the RAM for you. For better performance, especially when calling this function multiple times for the same scaling factor, it's recommended to instantiate the DAT light model via custom models.

from PIL.Image import open

from PIL_DAT.dat_light import DATLight

lumine_image = open(".github/lumine.png")
model = DATLight(2)  # Instantiate a reusable DATLight custom model instance
lumine_image = model.upscale(lumine_image)
lumine_image.show()

📊 Benchmarks

DAT will be compared to OpenCV's top super resolution model, EDSR, and a commercial SaaS product, Img.Upscaler.

All benchmark results presented here are reproducible. For detailed implementation, please consult the following resources:

  • Located in the benchmarks folder on the official GitHub repository of pillow-dat.
  • Within the scripts folder that houses our personal source code, used for testing this library.

Speed

Performance benchmarks have been conducted on a computing system equipped with an Intel(R) CORE(TM) i7-9750H CPU @ 2.60GHz processor, accompanied by a 2 × 8 Go at 2667MHz RAM configuration. Below are the recorded results:

In seconds 320 × 320 640 × 640 960 × 960 1280 × 1280
DAT light (x2) 16.1 65.3 146.8 339.8
DAT light (x3) 14.3 61.7 - -
DAT light (x4) 14.0 63.0 - -

The results were compared against the renowned OpenCV library, utilizing its EDSR model known for delivering superior image quality.

In seconds 320 × 320 640 × 640 960 × 960 1280 × 1280
EDSR (x2) 25.6 112.9 264.1 472.8
EDSR (x3) 24.3 112.5 - -
EDSR (x4) 23.6 111.2 - -

Note: Since we don't have control over Img.Upscaler's hardware specifications, it's challenging to provide accurate speed benchmarks.

Quality

Let's delineate five key themes encompassing all image types: Abstract, Animal, Nature, Object, and People. Subsequently, we will employ each solution to upscale images under each respective theme.

Abstract
Abstract
Animal
Animal
Nature
Nature
Object
Object
People
People

While the images may initially seem identical, closer inspection or zooming in reveals subtle quality differences. Let's examine the zoomed images for a clearer view! 🔎

Abstract Zoomed
Abstract Zoomed
Animal Zoomed
Animal Zoomed
Nature Zoomed
Nature Zoomed
Object Zoomed
Object Zoomed
People Zoomed
People Zoomed

While informative, these tests may not encompass every scenario for each solution. Your images may vary, impacting results. For the most reliable assessment, try each solution yourself to form your own opinion! 😉

Alpha-channel-awareness

State-of-the-art super-resolution models typically only support RGB images, and this holds true for the original DAT models as well. The reason behind this is that datasets used for comparing models in research predominantly consist of RGB images. As a user, this can pose a challenge. However, pillow-dat offers a solution with its built-in post-processing logic, effortlessly handling any alpha channel for you! 🌟

Alpha-channel-awareness
Alpha-channel-awareness

In this example, we're just comparing the basic usage of each solution. While it's possible to manage the alpha channel manually for the OpenCV's EDSR case, it would require additional effort.

👑 Conclusion

This library is founded upon the pioneering research paper, "Dual Aggregation Transformer for Image Super-Resolution".

@inproceedings{chen2023dual,
    title={Dual Aggregation Transformer for Image Super-Resolution},
    author={Chen, Zheng and Zhang, Yulun and Gu, Jinjin and Kong, Linghe and Yang, Xiaokang and Yu, Fisher},
    booktitle={ICCV},
    year={2023}
}

We extend our heartfelt appreciation to the researchers. The researchers' groundbreaking contributions have inspired the development of this library, pushing forward image super-resolution. 🙏