Name: What Fuels Transformers in Computer Vision? Unraveling ViT's Advantages
Price: 174.95 DKK
Availability: InStock
Author: Tolga Topal
ISBN: 9783346993311

What Fuels Transformers in Computer Vision? Unraveling ViT's Advantages af Tolga Topal

Du sparer 0% ift. normalprisen Spar 0%

Paperback
2-3 uger
174,95 kr.

Bag om What Fuels Transformers in Computer Vision? Unraveling ViT's Advantages

Master's Thesis from the year 2022 in the subject Computer Sciences - Artificial Intelligence, grade: 7.50, Universidad de Alcalá, course: Artificial Intelligence and Deep Learning, language: English, abstract: Vision Transformers (ViT) are neural model architectures that compete and exceed classical convolutional neural networks (CNNs) in computer vision tasks. ViT's versatility and performance is best understood by proceeding with a backward analysis. In this study, we aim to identify, analyse and extract the key elements of ViT by backtracking on the origin of Transformer neural architectures (TNA). We hereby highlight the benefits and constraints of the Transformer architecture, as well as the foundational role of self- and multi-head attention mechanisms. We now understand why self-attention might be all we need. Our interest of the TNA has driven us to consider self-attention as a computational primitive. This generic computation framework provides flexibility in the tasks that can be performed by the Transformer. After a good grasp on Transformers, we went on to analyse their vision-applied counterpart, namely ViT, which is roughly a transposition of the initial Transformer architecture to an image-recognition and -processing context. When it comes to computer vision, convolutional neural networks are considered the go to paradigm. Because of their proclivity for vision, we naturally seek to understand how ViT compared to CNN. It seems that their inner workings are rather different. CNNs are built with a strong inductive bias, an engineering feature that provides them with the ability to perform well in vision tasks. ViT have less inductive bias and need to learn this (convolutional filters) by ingesting enough data. This makes Transformer-based architecture rather data-hungry and more adaptable. Finally, we describe potential enhancements on the Transformer with a focus on possible architectural extensions. We discuss some exciting learning approaches in machine learning. Our last part analysis leads us to ponder on the flexibility of Transformer-based neural architecture. We realize and argue that this feature might possibility be linked to their Turing-completeness.

Vis mere

Sprog:
Engelsk

ISBN:
9783346993311

Forlag:
Grin Verlag

Indbinding:
Paperback

Sideantal:
44

Udgivet:
5. januar 2024

Udgave:
24001

Størrelse:
148x4x210 mm.

Vægt:
79 g.

Leveringstid: 2-3 uger.
Forventet levering: 22. november 2024

På lager

Normalpris

BLACK NOVEMBER
174,95 kr. + Fragt
~~174,95 kr.~~

Medlemspris

173,95 kr.

Prøv i 30 dage for 45 kr.
Herefter fra 79 kr./md. Ingen binding.

Bag om Tolga Topal

Vis alle bøger af Tolga Topal

Beskrivelse af What Fuels Transformers in Computer Vision? Unraveling ViT's Advantages

Master's Thesis from the year 2022 in the subject Computer Sciences - Artificial Intelligence, grade: 7.50, Universidad de Alcalá, course: Artificial Intelligence and Deep Learning, language: English, abstract: Vision Transformers (ViT) are neural model architectures that compete and exceed classical convolutional neural networks (CNNs) in computer vision tasks. ViT's versatility and performance is best understood by proceeding with a backward analysis. In this study, we aim to identify, analyse and extract the key elements of ViT by backtracking on the origin of Transformer neural architectures (TNA). We hereby highlight the benefits and constraints of the Transformer architecture, as well as the foundational role of self- and multi-head attention mechanisms.

We now understand why self-attention might be all we need. Our interest of the TNA has driven us to consider self-attention as a computational primitive. This generic computation framework provides flexibility in the tasks that can be performed by the Transformer. After a good grasp on Transformers, we went on to analyse their vision-applied counterpart, namely ViT, which is roughly a transposition of the initial Transformer architecture to an image-recognition and -processing context.
When it comes to computer vision, convolutional neural networks are considered the go to paradigm. Because of their proclivity for vision, we naturally seek to understand how ViT compared to CNN. It seems that their inner workings are rather different.
CNNs are built with a strong inductive bias, an engineering feature that provides them with the ability to perform well in vision tasks. ViT have less inductive bias and need to learn this (convolutional filters) by ingesting enough data. This makes Transformer-based architecture rather data-hungry and more adaptable.
Finally, we describe potential enhancements on the Transformer with a focus on possible architectural extensions. We discuss some exciting learning approaches in machine learning. Our last part analysis leads us to ponder on the flexibility of Transformer-based neural architecture. We realize and argue that this feature might possibility be linked to their Turing-completeness.

Brugerbedømmelser af What Fuels Transformers in Computer Vision? Unraveling ViT's Advantages

Andre købte også..

Find lignende bøger

Bogen What Fuels Transformers in Computer Vision? Unraveling ViT's Advantages findes i følgende kategorier:

What Fuels Transformers in Computer Vision? Unraveling ViT's Advantages - Tolga Topal - Bog

Bag om Tolga Topal

Beskrivelse af What Fuels Transformers in Computer Vision? Unraveling ViT's Advantages

Brugerbedømmelser af What Fuels Transformers in Computer Vision? Unraveling ViT's Advantages

Gør som tusindvis af andre bogelskere