ShaDocNet

ShaDocNet: Learning Spatial-Aware Tokens in Transformer for Document Shadow Removal

¹ University of Macau
² Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)
^*Indicates Equal Contribution
^†Indicates Corresponding Author

Abstract

Shadow removal improves the visual quality and legibility of digital copies of documents. However, document shadow removal remains an unresolved subject. Traditional techniques rely on heuristics that vary from situation to situation. Given the quality and quantity of current public datasets, the majority of neural network models are ill-equipped for this task. In this paper, we propose a Transformer-based model for document shadow removal that utilizes shadow context encoding and decoding in both shadow and shadow-free regions. Additionally, shadow detection and pixel-level enhancement are included in the whole coarse-to-fine process. On the basis of comprehensive benchmark evaluations, it is competitive with state-of-the-art methods.

Visualization Results

Previous methods, including traditional method such as Shah *et al.*'s method, or deep learning models, for exmaple, AEFNet, and BEDSR-Net, produce results with defects such as shade edges, overexposure and color fading in their results. Our model ShadocNet has much fewer artifacts and is much closer to the ground-truth shadow-free image.

Visual comparison of competing methods. From left to right:
(a) shadow image,
(b) ground truth,
(c) results of Wang *et al.*,
(d) results of Mask-ShadowNet,
(e) results of BEDSR-Net,
(f) results of ShadocNet w/o RefineNet,
(g) results of ShadocNet w/o Transformer encoder,
(h) results of ShadocNet.

BibTeX

@inproceedings{chen2023shadocnet, title={Shadocnet: Learning Spatial-Aware Tokens in Transformer for Document Shadow Removal}, author={Chen, Xuhang and Cun, Xiaodong and Pun, Chi-Man and Wang, Shuqiang}, booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages={1--5}, year={2023}, organization={IEEE} }

Acknowledgement

This work was supported in part by the University of Macau under Grant MYRG2022-00190-FST, in part by the Science and Technology Development Fund, Macau SAR, under Grant 0034/2019/AMJ, Grant 0087/2020/A2 and Grant 0049/2021/A, in part by the National Natural Science Foundations of China under Grants 62172403 and in part by the Distinguished Young Scholars Fund of Guangdong under Grant 2021B1515020019.

ShaDocNet: Learning Spatial-Aware Tokens in Transformer for Document Shadow Removal

Abstract

Method Overview

Model Architecture.

Quantitative Results

Quantitative comparisons of visual quality using RMSE, PSNR and SSIM.

Average OCR edit-distances.

Visualization Results

Poster

BibTeX

Acknowledgement