Deep Learning Projects
Project 1: Chest Radiograph Pathology Classification
- Reference person: Raffaele Mineo (raffaele.mineo@unict.it)
- Goal: To develop a model capable of predicting multiple pathologies from chest radiographs.
- Dataset:
- Name: CheXpert
- Description: 224,316 chest radiographs of 65,240 patients labeled with 14 observations (e.g., Cardiomegaly, Edema).
- Link: CheXpert on Kaggle
- Size: 11.5 GB
- Methodology: Train a convolutional neural network. Key challenges include handling uncertain labels, potentially using techniques like label smoothing or incorporating hierarchical dependencies between pathologies.
- Performance evaluation: Evaluate multi-label classification performance (e.g., AUC per label, F1-score).
- Reference papers:
- Irvin, Jeremy, et al. “Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison.” Proceedings of the AAAI conference on artificial intelligence. Vol. 33. No. 01. 2019.
- Reference/starting code: CheXpert Kaggle dataset and code
- Expected completion time: 40h
Project 2: Brain Tumor Segmentation
- Reference person: Raffaele Mineo (raffaele.mineo@unict.it)
- Goal: To segment different sub-regions of brain tumors (enhancing tumor, necrosis, and edema) from multi-parametric MRI scans.
- Dataset:
- Name: BraTS-2021 (similar to BraTS2020)
- Description: Multi-parametric MRI (T1, T1c, T2, FLAIR) of 2,040 glioma patients with voxel-wise labels.
- Link: BraTS2020 Training Data on Kaggle (Note: Link is for BraTS2020, project mentions BraTS-2021)
- Size: 8 GB
- Methodology: Implement a 3D U-Net architecture (or variants like nnU-Net) for volumetric segmentation.
- Performance evaluation: Evaluate segmentation accuracy using Dice scores on validation data for each sub-region.
- Reference papers:
- Baid, Ujjwal, et al. “The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classification.” arXiv preprint arXiv:2107.02314 (2021).
- Reference/starting code: BraTS2020 Kaggle dataset and code
- Expected completion time: 60h
Project 3: Diabetic Retinopathy Detection
- Reference person: Raffaele Mineo (raffaele.mineo@unict.it)
- Goal: To classify fundus images into five severity levels of diabetic retinopathy.
- Dataset:
- Name: APTOS-2019 Blindness Detection
- Description: 3,662 fundus images labeled into five DR severity levels (No DR, Mild, Moderate, Severe, Proliferative).
- Link: APTOS-2019 on Kaggle
- Size: 8.6 GB
- Methodology: Fine-tune a pre-trained Convolutional Neural Network (e.g., EfficientNet). Apply data augmentation techniques and address class imbalance using methods like SMOTE.
- Performance evaluation: Evaluate 5-class classification performance (e.g., Quadratic Weighted Kappa, accuracy, precision/recall per class).
- Reference papers:
- Sikder, Niloy, et al. “Early blindness detection based on retinal images using ensemble learning.” 2019 22nd International conference on computer and information technology (ICCIT). IEEE, 2019.
- Reference/starting code: APTOS-2019 Kaggle dataset and code
- Expected completion time: 40h
Project 4: Skin Lesion Classification
- Reference person: Raffaele Mineo (raffaele.mineo@unict.it)
- Goal: To classify dermoscopic images of skin lesions as benign or malignant.
- Dataset:
- Name: ISIC-2020 (similar to ISIC-2019)
- Description: 33,126 dermoscopic images (benign vs. malignant) confirmed via histopathology.
- Link: ISIC-2019 Skin Lesion Images on Kaggle (Note: Link is for ISIC-2019, project mentions ISIC-2020)
- Size: 9.8 GB
- Methodology: Build a classification pipeline, potentially using a ResNet backbone. Optionally, incorporate segmentation masks for lesion cropping prior to classification.
- Performance evaluation: Optimize and evaluate using ROC-AUC.
- Reference papers:
- Cassidy, Bill, et al. “Analysis of the ISIC image datasets: Usage, benchmarks and recommendations.” Medical image analysis 75 (2022): 102305.
- Reference/starting code: ISIC-2019 Kaggle dataset and code
- Expected completion time: 50h
Project 5: ECG Arrhythmia Classification
- Reference person: Raffaele Mineo (raffaele.mineo@unict.it)
- Goal: To classify 12-lead ECG signals for multi-label arrhythmia detection.
- Dataset:
- Name: PTB-XL
- Description: 21,799 clinical 12-lead ECGs (10s each) from 18,869 patients, annotated with up to 71 diagnostic statements (rhythm, form, diagnostic).
- Link: PTB-XL Dataset on Kaggle
- Size: 3.2 GB
- Methodology: Design a 1D-CNN or a Transformer-based model suitable for sequential ECG data.
- Performance evaluation: Compare performance across different diagnostic tasks (e.g., specific arrhythmia detection) using appropriate multi-label classification metrics.
- Reference papers:
- Wagner, Patrick, et al. “PTB-XL, a large publicly available electrocardiography dataset.” Scientific data 7.1 (2020): 1-15.
- Reference/starting code: PTB-XL Kaggle dataset and code
- Expected completion time: 40h
- Reference person: Raffaele Mineo (raffaele.mineo@unict.it)
- Goal: To classify patches from lymph node histology images for the presence of metastatic tissue.
- Dataset:
- Name: PatchCamelyon (PCam)
- Description: 327,680 color image patches (96x96px) from lymph node histology, with binary labels for metastatic tissue.
- Link: PatchCamelyon on Kaggle
- Size: 11.5 GB
- Methodology: Train a CNN (e.g., DenseNet). Experiment with semi-supervised learning techniques or uncertainty estimation to improve detection robustness.
- Performance evaluation: Evaluate classification performance (e.g., AUC, accuracy) for patch-based metastasis detection.
- Reference papers:
- Veeling, Bastiaan S., et al. “Rotation equivariant CNNs for digital pathology.” Medical image computing and computer assisted intervention–mICCAI 2018: 21st international conference, granada, Spain, September 16-20, 2018, proceedings, part II 11. Springer International Publishing, 2018.
- Reference/starting code: Kaggle Histopathological Cancer Detection Challenge GitHub
- Expected completion time: 40h
Project 7: Federated Adaptive Class-specific Prompt Tuning
- Reference person: Alessio Masano (alessio.masano@phd.unict.it)
- Goal: To adapt an existing federated multi-domain learning framework to support federated class-specific prompt tuning.
- Dataset:
- Name: Food-101, FGVC Aircraft, Oxford Pets (one to be chosen, or preferably use all three)
- Link: Food-101, FGVC Aircraft, Oxford Pets
- Size: Food-101 5GB, FGVC Aircraft 2.75GB, Oxford Pets 800MB
- Methodology: Modify and extend an existing federated learning framework to incorporate class-specific prompt tuning mechanisms for adapting models across different domains/clients.
- Performance evaluation: Evaluate the effectiveness of class-specific prompt tuning in a federated setting, likely focusing on model performance on diverse data distributions and personalization.
- Reference papers:
- Shangchao Su, Mingzhao Yang, Bin Li, Xiangyang Xue. “Federated Adaptive Prompt Tuning for Multi-Domain Collaborative Learning.”
- Reference/starting code: FedAPT GitHub
- Expected completion time: 60h
Project 8: Federated Adaptive Prompt Tuning in medical field
- Reference person: Alessio Masano (alessio.masano@phd.unict.it)
- Goal: To adapt and apply an existing federated multi-domain framework (likely FedAPT) using medical datasets.
- Dataset:
- Name: MedMNIST v2 and MedIMeta (using domains with smaller size, e.g., Breast Ultrasound, Diabetic Retinopathy, Mammography, Skin Lesion Evaluation, etc.)
- Link: MedMNIST v2, MedIMeta Zenodo, MedIMeta GitHub
- Size: Varies depending on selected subsets.
- Methodology: Apply the federated adaptive prompt tuning framework to medical image datasets, focusing on challenges specific to medical data like privacy, heterogeneity, and domain shift.
- Performance evaluation: Assess the model’s performance on various medical tasks, its adaptability to different medical domains/institutions, and the benefits of prompt tuning in this context.
- Reference papers:
- Shangchao Su, Mingzhao Yang, Bin Li, Xiangyang Xue. “Federated Adaptive Prompt Tuning for Multi-Domain Collaborative Learning.”
- Reference/starting code: FedAPT GitHub (as a base)
- Expected completion time: 50h
Project 9: Object Counting in Aerial Images
- Reference person: Morteza Moradi (morteza.moradi@unict.it)
- Goal: To count objects (e.g., animals) in drone or satellite images.
- Dataset:
- Name: custom dataset of aerial images of animals
- Link: https://datadryad.org/dataset/doi:10.5061/dryad.8931zcrv8
- Size: 361.47 MB
- Methodology: Utilize YOLOv8 for initial object detection. Implement post-processing steps to refine counts, such as removing overlaps or applying clustering, potentially using segmentation-based refinement. Alternative approaches could involve density map estimation.
- Performance evaluation: Evaluate counting accuracy (e.g., Mean Absolute Error, RMSE) and detection performance if applicable.
- Reference papers:
- (Implicit) Reference to models like “Counting animals in aerial images with a density map estimation model.”
- Reference/starting code: YOLOv8 framework.
- Expected completion time: 60h
Project 10: Lightweight Object Detection
- Reference person: Morteza Moradi (morteza.moradi@unict.it)
- Goal: To train a lightweight bounding box object detector for a custom dataset.
- Dataset:
- Name: PASCAL VOC
- Link: N/A (can be imported through PyTorch)
- Size: 3 GB
- Methodology: Train an object detector using pretrained lightweight models such as MobileNet-SSD and/or YOLOv5n. Focus on adapting these models to a specific custom application.
- Performance evaluation: Evaluate object detection performance (e.g., mAP, speed) on the custom dataset.
- Reference papers:
- “Performance Analysis of YOLOv3, YOLOv4 and MobileNet SSD for Real Time Object Detection.”
- Reference/starting code: MobileNet-SSD and YOLOv5n implementations.
- Expected completion time: 50h
Project 11: Medical Image Segmentation using Foundation Models
- Reference person: Federica Proietto Salanitri (federica.proiettosalanitri@unict.it)
- Goal: To identify a smaller, more efficient prompt encoder that, when combined with the frozen Segment Anything Model (SAM), achieves competitive segmentation performance on medical imaging tasks while reducing model size and computational cost.
- Dataset:
- Methodology: Adapt the Segment Anything Model (SAM) to medical images by exploring and integrating alternative, lightweight prompt encoders. The core SAM model is to remain frozen.
- Performance evaluation: Measure segmentation quality (e.g., Dice, IoU) and compare model size and computational cost against existing SAM adaptations like AutoSAM.
- Reference papers:
- Shaharabany, Tal, et al. “AutoSAM: Adapting SAM to Medical Images by Overloading the Prompt Encoder.”
- Reference/starting code: AutoSAM GitHub
- Expected completion time: 40h
Project 12: Aligned Federated Learning with B-cos networks
- Reference person: Rutger Hendrix (rutger.hendrix@phd.unict.it)
- Goal: To explore weight-input alignment promoted by B-cos models as an implicit regularizer in federated learning to mitigate overfitting on local non-IID data.
- Dataset:
- Name: Cifar10
- Link: Standard dataset (e.g., via PyTorch torchvision)
- Size: 170 MB
- Methodology: Start with pre-trained standard models distributed across clients. Apply B-cos transformations (“b-cosification”) to these local models and then fine-tune them in a federated manner. An optional extension is to incorporate federated adaptive B (hyperparameter) tuning based on local needs (interpretability or performance).
- Performance evaluation: Assess model performance on non-IID data distributions, degree of overfitting, and potentially interpretability gains.
- Reference papers:
- Dubost, Felix, et al. “B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable.”
- Reference/starting code: implement B-cosification transformation within a federated learning framework.
- Expected completion time: 50h-60h
Project 13: Uncertainty-aware adaptive learning in Segmentation
- Reference person: Rutger Hendrix (rutger.hendrix@phd.unict.it)
- Goal: To investigate uncertainty-aware learning techniques to improve model performance on uncertain regions of interest in image segmentation.
- Dataset:
- Name: BraTs (Brain Tumor Segmentation dataset)
- Link: Standard dataset (e.g., from MICCAI BraTS Challenge)
- Size: N/A
- Methodology: Explore how targeted data augmentation strategies and/or specialized loss functions can be applied to areas where the model demonstrates high uncertainty during training. This aims to guide the model to dedicate more learning capacity to challenging areas.
- Performance evaluation: Evaluate segmentation performance, particularly in uncertain regions, and analyze the impact of uncertainty-guided techniques.
- Reference papers:
- Yang, Xiaoguo, et al. “UGLS: an uncertainty guided deep learning strategy for accurate image segmentation.”
- Reference/starting code: implementation of uncertainty estimation (e.g. Evidential learning, Monte Carlo Dropout,) and adaptive learning strategies (e.g data augmentation, adaptive loss).
- Expected completion time: 50h
Project 14: Aligning human (eye-gaze) and model (saliency) attention
- Reference person: Rutger Hendrix (rutger.hendrix@phd.unict.it)
- Goal: To align human attention (measured by eye-gaze) with model attention (through saliency maps).
- Dataset:
- Name: MIMIC-GAZE-JPG
- Link: https://github.com/ukaukaaaa/GazeGNN?tab=readme-ov-file
- Size: 9 GB
- Methodology: Investigate different types of model saliency generation methods (e.g., layerwise relevance propagation, class activation mapping, guided backpropagation, occlusion methods), to identify those most suitable for training a classifier that is aligned with human saliency. This alignment is encouraged during training by incorporating an additional loss term that minimizes the difference between human and model saliency maps.
- Performance evaluation: Measure the alignment/correlation between human eye-gaze data and model-generated saliency maps, and classification accuracy, using appropriate metrics.
- Reference papers:
- Boyd, Aidan, et al. “Cyborg: Blending human saliency into the loss improves deep learning-based synthetic face detection.”
- Reference/starting code: N/A (requires implementation of various saliency methods and comparison framework).
- Expected completion time: 40h
Project 15: Federated Subnetwork sharing
- Reference person: Rutger Hendrix (rutger.hendrix@phd.unict.it)
- Goal: To investigate ‘information separation’ in multi-input multi-output (MIMO) neural networks for federated learning, aiming to reduce communication costs by selectively sharing high-attribution subnetworks.
- Dataset:
- Name: Cifar10
- Link: Standard dataset (e.g., via PyTorch torchvision)
- Size: 170 MB
- Methodology:
- Convert a standard CNN architecture to a MIMO configuration.
- Analyze how class-specific information flows through the network’s bottleneck layers.
- Develop methods to identify high-attribution nodes (nodes crucial for specific classes).
- Demonstrate that selectively sharing only parameters related to these high-attribution nodes during federated learning can substantially reduce communication costs while maintaining model performance.
- Performance evaluation: Evaluate model classification accuracy, communication cost reduction, and the effectiveness of identifying and sharing class-attributing subnetworks.
- Reference papers:
- Havasi, Marton, et al. “Training independent subnetworks for robust prediction.”
- Reference/starting code: requires MIMO network implementation, identify attributing nodes, e.g. via Taylor expention, and federated learning.
- Expected completion time: 60h
Project 16: Identifying Brain Regions with fMRI
- Reference person: Marco Finocchiaro (finocchiaro.marco@phd.unict.it)
- Goal: To design a model capable of classifying the brain region to which a 3D voxel-based cube extracted from fMRI signals belongs.
- Dataset:
- Name: Kamitani Lab dataset (referred to as “god-dataset” on Kaggle)
- Description: fMRI data.
- Link: God Dataset on Kaggle
- Size: 7.5 GB
- Methodology: Develop a deep learning model (likely a 3D CNN) that takes 3D fMRI voxel cubes as input and classifies them into predefined brain regions.
- Performance evaluation: Assess classification accuracy for identifying brain regions.
- Reference papers: Brain regions atlasing
- Reference/starting code: N/A
- Expected completion time: 40h
-
Can be extended to thesis
Project 17: Detecting Scalp Area from a Single EEG Electrode Signal
- Reference person: Marco Finocchiaro (finocchiaro.marco@phd.unict.it)
- Goal: To develop a data-driven pipeline capable of detecting specific scalp regions based on the signal captured by a single EEG electrode.
- Dataset:
- Methodology: Design a model (e.g., using signal processing and machine learning/deep learning techniques) to analyze single-electrode EEG signals and infer the corresponding scalp region.
- Performance evaluation: Evaluate the accuracy of scalp region detection.
- Reference papers: N/A
- Reference/starting code: N/A
- Expected completion time: 30-40h
Project 18: Disentangling Color Semantics in CLIP: Discovering and Evaluating Sparse Sub-Spaces for Object Color Representation
- Reference person: Giovanni Bellitto (giovanni.bellitto@unict.it)
- Goal: To identify, characterize, and validate directions or sparse sub-spaces in CLIP’s latent space that predominantly encode the color attribute of an object.
- Dataset:
- Name: N/A (will likely use CLIP models and probe them with various image/text inputs)
- Link: N/A
- Size: N/A
- Methodology:
- Assess if the transition from “base object” to “base object + color” is largely linear and consistent in CLIP’s latent space.
- Determine if a small set of coordinates (or a low-density linear combination) explains most of the variance associated with color without sacrificing CLIP’s retrieval/classification performance.
- Verify if the same color-encoding direction is shared (or highly similar) across text and image modalities.
- Performance evaluation: Evaluate the linearity of color representation, the sparsity of the color sub-space, the impact on CLIP’s core performance, and the alignment of color representation across modalities.
- Reference papers: N/A
- Reference/starting code: N/A (will involve working with pre-trained CLIP models).
- Expected completion time: 30h
Project 19: Applying registers to B-Cos Networks
- Reference person: Amelia Sorrenti (amelia.sorrenti@phd.unict.it)
- Goal: To apply the concept of “registers” (from Vision Transformers) to B-cosified Vision Transformers (ViTs).
- Dataset:
- Dataset choice might depend on the specific ViT and B-Cos models used, possibly ImageNet or similar.
- Methodology: Integrate register tokens into the architecture of B-cosified Vision Transformers and evaluate their impact.
- Performance evaluation: Assess changes in model performance, interpretability, or other relevant metrics due to the introduction of registers in B-Cos ViTs.
- Reference papers:
- “Vision Transformers Need Registers” [https://arxiv.org/abs/2309.16588]
- “B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable” [https://arxiv.org/abs/2306.10898]
- Reference/starting code:
- ViT model with registers: https://github.com/kyegomez/Vit-RGTS
- B-cos ViTs: https://github.com/B-cos/B-cos-v2
- Expected completion time: 30h
- This project is indicated for students with personal computing resources or thesis interest.
Project 20: Continual Learning with registers
- Reference person: Amelia Sorrenti (amelia.sorrenti@phd.unict.it)
- Goal: To evaluate the impact of using “registers” (as in Vision Transformers) in a class-incremental continual learning setting.
- Dataset:
- Standard datasets for evaluating robustness and continual learning.
- Name: Imagenet-R (ImageNet-Renditions), CIFAR-100
- Link:
- Imagenet-R: https://github.com/hendrycks/imagenet-r
- CIFAR-100: https://www.cs.toronto.edu/~kriz/cifar.html
- Size:
- Imagenet-R: 2 GB
- CIFAR-100: 163 MB
- Methodology: Implement or adapt a continual learning setup (class-incremental) for a Vision Transformer model that incorporates registers.
- Performance evaluation: Assess continual learning metrics such as average accuracy, forgetting, and intransigence, comparing models with and without registers.
- Reference papers:
- “Vision Transformers Need Registers” [https://arxiv.org/abs/2309.16588]
- A Survey of Continual Learning: [https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10444954]
- Reference/starting code:
- Requires setting up a continual learning experiment with ViTs and registers.
- A practical starting point for working with continual learning: https://github.com/aimagelab/mammoth
- This framework already includes dataloaders for the suggested datasets.
- ViT model with registers: https://github.com/kyegomez/Vit-RGTS
- Expected completion time: 30h
- This project is indicated for students with personal computing resources or thesis interest.
Project 21: Dataset Distillation for CL
- Reference person: Matteo Pennisi (matteo.pennisi@unict.it)
- Goal: To create a buffer of distilled images using dataset distillation techniques and evaluate whether this improves continual learning (CL) performance.
- Dataset:
- Methodology: Implement dataset distillation techniques to create a small, synthetic dataset (buffer) that summarizes a larger dataset. Use this distilled buffer in a continual learning scenario (e.g., as a memory replay mechanism).
- Performance evaluation: Evaluate standard continual learning metrics (accuracy, forgetting) when using the distilled dataset buffer compared to other CL strategies or no buffer.
- Reference papers:
- Expected completion time: 30h
Project 22: Dataset Distillation for FL
- Reference person: Matteo Pennisi (matteo.pennisi@unict.it)
- Goal: To apply dataset distillation in a federated learning (FL) context, where each client creates a distilled version of its local dataset, and the server fine-tunes the global model on this aggregated distilled data.
- Dataset:
- Methodology:
- Each client in a federated network applies dataset distillation to its local data.
- Clients send their (small) distilled datasets to the server.
- The server aggregates these distilled datasets.
- The global federated model (e.g., obtained via FedAvg) is fine-tuned on the server using this aggregated distilled data before being shared back with clients.
- Performance evaluation: Evaluate FL performance (e.g., global model accuracy, communication efficiency) when using distilled datasets compared to standard FL approaches.
- Reference papers:
- Expected completion time: 30h
Project 23: GenAI to explain CNN
- Reference person: Matteo Pennisi (matteo.pennisi@unict.it)
- Goal: To use generative AI (specifically Stable Diffusion) and lexical databases (Core WordNet) to explain the activations of a Convolutional Neural Network (CNN).
- Dataset:
- Name: N/A (the “dataset” would be images generated by Stable Diffusion using Core WordNet words as prompts).
- Methodology:
- Generate images using Stable Diffusion with words from Core WordNet synsets as prompts.
- Pass these generated images through a pre-trained CNN.
- Analyze the CNN’s activations to identify which WordNet synsets (and corresponding concepts/words) are most activated by the generated images, thereby providing an explanation for what the CNN “understands” or associates with those synsets.
- Performance evaluation: Qualitative and potentially quantitative assessment of how well the identified activated synsets explain the CNN’s behavior or learned features.
- Reference papers:
- Reference/starting code: N/A (requires using Stable Diffusion, WordNet, and analyzing CNN activations).
- Expected completion time: 30h
Project 24: Static Images Network analyzer
- Reference person: Matteo Pennisi (matteo.pennisi@unict.it)
- Goal: To create a dataset specifically designed to detect common biases (e.g., color, background) in image classification models and use the analysis results to generate a report for each class, potentially using an LLM.
- Dataset:
- Name: Custom-created dataset for bias detection.
- Link: N/A
- Size: N/A
- Methodology:
- Design and create a dataset where image attributes like object color, background, etc., are systematically varied while the main object class remains the same (or vice-versa) to probe for biases.
- Test a pre-trained image classification network on this dataset.
- Analyze the network’s performance variations based on these controlled attributes to identify biases.
- Use an LLM to generate a summary report for each class, detailing detected biases.
- Performance evaluation: The “performance” is the quality and insightfulness of the bias detection and the generated reports.
- Reference papers: N/A
- Reference/starting code: N/A
- Expected completion time: 30h
Project 25: Automatic check of explainability method using segmentation maps
- Reference person: Matteo Pennisi (matteo.pennisi@unict.it)
- Goal: To automatically evaluate the quality of an explainability method (Grad-CAM) by measuring the overlap between its generated saliency map and ground-truth segmentation masks or bounding boxes.
- Dataset:
- Name: PASCAL VOC
- Link: N/A (available from PyTorch)
- Size: 3 GB
- Methodology:
- For a given image and a target class, generate an explainability map using Grad-CAM.
- Obtain a ground-truth segmentation mask or bounding box for the object of interest in the image.
- Quantitatively measure the overlap (e.g., using IoU or pointing game metrics) between the Grad-CAM saliency map and the ground truth.
- Performance evaluation: The metric is the degree of overlap, which serves as a proxy for the faithfulness/accuracy of the Grad-CAM explanation.
- Reference papers: N/A
- Reference/starting code: N/A (requires Grad-CAM implementation and evaluation pipeline).
- Expected completion time: 30h
Project 26: Audio Image Alignement
- Reference person: Simone Carnemolla (simone.carnemolla@phd.unict.it)
- Goal: To align audio and image embeddings within a shared embedding space and perform zero-shot audio-image pair identification.
- Dataset:
- Name: AVE (Audio-Visual Events dataset or similar)
- Link: N/A (Standard dataset for audio-visual learning)
- Size: N/A
- Methodology:
- Training: Train a model (e.g., using two encoders for audio and image, and a contrastive loss) to map audio and image inputs into a common embedding space such that corresponding pairs are close and non-corresponding pairs are distant.
- Zero-shot Inference (Pair-wise Identification):
- For each audio sample in the test set, select its corresponding image and C-1 randomly chosen distractor images from the test set (C is the number of classes).
- Calculate the embedding similarity (e.g., cosine similarity) between the audio sample’s embedding and each of the C selected image embeddings.
- The model’s prediction is the audio-image pair with the highest similarity score.
- Performance evaluation: Pair-wise identification accuracy: a prediction is correct if the highest similarity occurs between the audio-image pair that belongs to the same class.
- Reference papers:
- (Implicit) CLIP-CLAP or similar audio-visual alignment papers.
- Reference/starting code: https://github.com/openai/CLIP
- Expected completion time: 30h-40h
Project 27: Textual Backward
- Reference person: Simone Carnemolla (simone.carnemolla@phd.unict.it)
- Goal: To investigate the decision-making process of a visual classifier using a training-free framework that simulates a training process through textual instructions and iterative prompt refinement with an LLM.
- Dataset:
- Name: Imagenet 1k (specifically, Salient Imagenet or a subset of 5 biased classes from ImageNet).
- Link: Standard ImageNet dataset.
- Methodology:
- Setup: Visual Classifier (e.g., RobustResNet50), LLM for prompt generation/refinement.
- Optimization Pipeline (per class):
- LLM generates an initial textual prompt for a target class.
- Generate an image from this prompt (e.g., using a text-to-image model).
- Classify the generated image using the visual classifier.
- Provide the classifier’s prediction, the target class, and the original prompt to the LLM to generate an improved prompt.
- Repeat for ‘n’ optimization steps.
- Inference Pipeline: Use the final optimized prompt to generate 100 images.
- Performance evaluation: Measure how many of the 100 images generated using the final optimized prompt activate the target class in the visual classifier. Success is based on developing system prompts for initial/refined generation and implementing the pipeline. Optional: Integrate an LLM for textual loss or auxiliary agents.
- Reference papers:
- (Implicit) TextGrad or similar papers on understanding models via generative approaches.
- Reference/starting code: N/A (requires interaction between LLMs, text-to-image models, and classifiers).
- Expected completion time: 30h-40h
- Can be extended to thesis
Project 28: Segmentation using bbox from GroundingSAM
- Reference person: Giovanni Patanè (patane.giovanni@phd.unict.it)
- Goal: To leverage bounding boxes obtained from GroundingSAM (or a similar open-world object detector) to perform or refine image segmentation.
- Dataset:
- Methodology: Integrate GroundingSAM (or a model that provides open-world object detection with bounding boxes based on text prompts) with a segmentation model. The bounding boxes could be used as prompts or constraints for a segmentation model (like SAM itself, or another architecture) to achieve more precise or class-specific segmentation.
- Performance evaluation: Evaluate segmentation performance (e.g., IoU, Dice score) based on the input prompts/bounding boxes.
- Reference papers:
- Reference/starting code: GroundingDINO, Grounding SAM, GOOSE Dataset: Image Processing
- Expected completion time: ~30-40h
Project 29: Classification Breast Cancer
- Reference person: Giovanni Patanè (patane.giovanni@phd.unict.it)
- Goal: To train or fine-tune a deep learning model to classify abnormal areas of tissue (benign vs. malignant) in mammography images.
- Dataset:
- Methodology:
- Preprocess mammography images from the CBIS-DDSM dataset.
- Train or fine-tune a deep learning model.
- Performance evaluation: Evaluate model performance using appropriate metrics (e.g., AUC for classification, IoU/Dice for segmentation).
- Reference papers:
- “Transfer Learning and Fine Tuning in Breast Mammogram Abnormalities Classification on CBIS-DDSM Database.”
- Expected completion time: 30h-40h~~
Project 30: Behaviour decoding from Mouse Visual Cortex
- Reference person: Salvatore Calcagno (salvatore.calcagno@unict.it)
- Goal: To determine if it’s possible to decode behaviour using 2p brain signals.
- Dataset:
- Methodology: Given 2p traces (from SENSORIUM or similar), train a model to predict mouse behavioural variables (pupil size and position, speed)
- Performance evaluation:
- MSE between real and predicted behaviour
- Correlation between real and predicted behaviour
- Reference papers: https://arxiv.org/abs/2206.08666
- Expected completion time: 30h
Project 31: Image Classification from Brain Signals
- Reference person: Salvatore Calcagno (salvatore.calcagno@unict.it)
- Goal: To determine if it’s possible to classify the category of seen objects (e.g., animate vs. inanimate, nature vs. artificial) using only brain signals.
- Dataset:
- Methodology: Given brain representations of images (from SENSORIUM or similar), train a classifier to predict user-defined categories of the objects viewed by the subject. Images can also be used during the training phase to help learn the mapping or representations.
- Performance evaluation: Classification accuracy for the defined categories based on brain signals.
- Reference papers: https://arxiv.org/abs/2206.08666
- Expected completion time: 30h
Project 32: Brain Signal Quantization
- Reference person: Salvatore Calcagno (salvatore.calcagno@unict.it)
- Goal: To evaluate an advanced quantization method against a VQ-VAE baseline on 2-photon ΔF/F (calcium imaging) traces.
- Dataset:
- Methodology:
- Implement a VQ-VAE with a 512-entry codebook as a baseline.
- Choose an advanced quantization method (e.g., from audio domain references) and replace the VQ-VAE’s quantization layer with it, keeping the same encoder/decoder.
- Train both models under identical conditions on 2-photon ΔF/F traces.
- Evaluate by training a linear decoder on the quantized representations to predict behavioral variables (e.g., eye position, running speed) or stimulus information.
- Performance evaluation: Report metrics such as:
- MSE on reconstruction of the ΔF/F traces.
- Codebook entropy (% of used codes).
- Performance of decoding behavioral or stimulus variables from the quantized representations.
(Success is not strictly required, focus is on the investigation).
- Reference papers:
- Reference/starting code: https://github.com/SalvoCalcagno/quantformer2024
- Expected completion time: 40h+
Project 33: Few-shot Predictive Maintenance Benchmarking
- Reference person: Simone Palazzo (simone.palazzo@unict.it)
- Goal: To evaluate the effectiveness of different approaches in estimating the Remaining Useful Lifetime (RUL) of a power electronic device, given a few time series samples of a temperature sensor.
- Dataset:
- Name: Private dataset
- Description: 8 samples, ~400 timesteps per sample of temperature sensor data.
- Size: ~1 MB
- Methodology:
- Evaluate CNN and Transformer solutions, trained from scratch on the few-shot data.
- Evaluate CNN and Transformer solutions, using models pretrained on other (larger, possibly related) time series datasets and then fine-tuned.
- Propose and evaluate data augmentation and/or preprocessing procedures suitable for time series in a few-shot context.
- Performance evaluation:
- Remaining Useful Lifetime (RUL) plots.
- Area Under the Curve (AUC) of RUL plot.
- Reference papers:
- Reference/starting code: Provided for dataset loading, a baseline LSTM model training, and plot generation.
- Expected completion time: 30h