Abstract
Deep learning has become a major field with many applications: from face recognition to generating images to compressing data. As a result, deep learning is becoming more and more integrated into our daily lives. We demonstrate how deep learning can be deployed for several applications in three different domains namely, improving business processes for agriculture, high-dimensional density estimation with generative models, and neural compression of data.
The first domain aims to optimize the business process of a seed breeding company operating in agriculture. Therefore, we examine a dataset of white cabbage seedling images. The aim is to predict the (un)successfulness of the seedlings based on only an image. Since accurate and early predictions can terminate the seedlings stay in a growth chamber, which provides more space for other seeds to grow. Further, automating the process aids professionals. We show how a particular convolutional neural network, AlexNet, outperforms the other machine learning methods and that the model can accurately determine if a seedling is going to grow (un)successfully. Moreover, we observe that training AlexNet on earlier days generalizes to predictions on later days.
The second domain concerns the utilization of generative modeling for high dimensional density estimation since this is an open problem in deep learning. We aid to close the gap in estimating the true data distribution that is modeled with generative models. More concretely, we improve model performance of a generative model, known as the normalizing flow. Therefore, we construct new methods and propose an activation function, which we call Concatenated LipSwish. The new architecture is known as i-DenseNet and outperforms its predecessor Residual Flow and other comparable flow-based models on generative and hybrid modeling performance.
Finally, the third domain covers the neural compression process for images and videos. With the growing amount of data worldwide, compression, in general, has become a fundamental part of data storage and transmission. We first examine a neural image compression model, known as the mean-scale hyperprior. Even though these models are effective in practice, they do have limited capacity when it comes to optimization and generalization. Therefore, we introduce three new refinement methods that aids the compression performance and results in improved compression results per image. Additionally, we aim to optimize the latents of an already pre-trained image compression model, by keeping the networks weights fixed, and only further optimizing its latents with the refinement procedures. We show how the method can be extended to three-class rounding, outperforms the baselines, can be used to move partly along the rate-distortion curve and how it is robust to hyperparameter changes.
Finally, we introduce a neural video compression model, based on scale-space flow, that allocates more bits to pre-specified regions-of-interest. We introduce two versions that are able to achieve this, namely, an implicit and a latent scaling model. In general, both models out-perform all baselines in terms of the rate-distortion performance in regions of interest and can generalize to different datasets at inference time. The latent scaling model has the best performance and can explicitly control the quantization binwidth of latent variables by only using a single model during evaluation. Further, we find that the models show a negligible performance gap when trained with synthetic region-of-interest masks, which do not correlate with the content of the video, compared to training with pixel-wise annotated masks.
| Original language | English |
|---|---|
| Qualification | PhD |
| Awarding Institution |
|
| Supervisors/Advisors |
|
| Award date | 17 Oct 2024 |
| Print ISBNs | 9789464735505 |
| DOIs | |
| Publication status | Published - 17 Oct 2024 |
Keywords
- Deep learning
- generative modeling
- neural compression
- convolutional neural networks
- artificial intelligence
- machine learning