Enhancing 1-Second SELD Performance with Filter Bank Analysis and SCConv Integration in CST-Former

Introduction

This repository contains the code and resources for our paper:

"Enhancing 1-Second SELD Performance with Filter Bank Analysis and SCConv Integration in CST-Former"

In this work, we address the limitations of current Sound Event Localization and Detection (SELD) systems in handling short time segments (specifically 1-second windows). This is crucial for real-world applications requiring low-latency and fine temporal resolution. We establish a new baseline for SELD performance on 1-second segments.

Our key contributions are:

Establishing SELD performance on 1-second segments: Providing a new benchmark for short-segment analysis in SELD tasks.
Comparative analysis of filter banks: Systematically comparing Bark, Mel, and Gammatone filter banks for audio feature extraction, demonstrating that Gammatone filters achieve the highest overall accuracy.
Integration of SCConv modules into CST-Former: Replacing convolutional components in the CST block with the SCConv module, yielding measurable F-score gains and enhancing spatial and channel feature representation. The figure shows the model architecture.

Code Outline

The repository is organized as follows:

cls_dataset/:
- cls_dataset.py: PyTorch Dataset implementation for training procedure, aims to accelerate the trainning process.
models/: source code for different models.
- architecture/: source code for CST-former and SCConv CST former
- baseline_model.py: source code for SELDnet
- conformer.py:source code for Conv-Conformer
parameters.py script consists of all the training, model, and feature configurations. One can add new configurations for feature extracion and model architecture. If a user wants to change some parameters or use a new configuration, they have to create a sub-task with unique id here. Check code for examples.
batch_feature_extraction.py is a standalone wrapper script, that extracts the features, labels, and normalizes the training and test split features for a given dataset. Make sure you update the location of the downloaded datasets in parameters.py before.
The cls_compute_seld_results.py script computes the metrics results on your DCASE output format files.
The cls_data_generator.py script provides feature + label data in generator mode for validation and test.
The cls_feature_class.py script has routines for labels creation, features extraction and normalization. Filter bank options are use as an attribute of this class.
The cls_vid_features.py script extracts video features for the audio-visual task from a pretrained ResNet model. Our system donnot implement audio-visual track.
The criterions.py encompasses some custome loss functions and multi-accdoa

The SELD_evaluation_metrics.py script implements the metrics for joint evaluation of detection and localization.
The torch_run_vanilla.py is a wrapper script that trains the model and calculates the metrics for each test dataset. The training stops when the F-score (check the paper) stops improving after 50 epochs of patience.
README.md: Project documentation.

Preparation

Prerequisites

Operating System: Linux recommended, codes are not being tested on Windows.
Python: Version 3.11 or higher.
Anaconda: Recommended for environment management.

Installation Steps

Clone the Repository

git clone https://github.com/way2coder/DCASE2024.git #cd DCASE2024

Create a Conda Environment

conda create -n seld python=3.11 conda activate seld

Install Dependencies
Install required Python packages using pip:
```
pip install -r requirements.txt
```
Alternatively, install using conda:
```
conda install --file requirements.txt
```

Data Preprocessing

Dataset

We use the [DCASE2024 Task 3] Synthetic SELD mixtures for baseline training dataset for our experiments.

Steps

Download the Dataset
Download the development and evaluation datasets from the DCASE challenge website and place them in the data/ directory. Set the parameter: datasets_dir_dic to add path for your dataset in parameters.py, so does the parameter:feat_label_dir_dic which saves all your labels.npy and features.npy.
Generate new labels with fine resolution and Extract Audio Features
Run the preprocessing script to extract features using your argv number, for example:
```
python batch_feature_extraction.py 1
```
Typically, this will generate about 50G feature files for each filter when using default settings.
Data Augmentation (Optional)
Apply data augmentation techniques if needed, unfortunately we do not implement augmetation.

Scripts' Number

The scripts' number of parameters.py are as follows.

Model	Scripts' Number	Filter Type	Pamerers(M)
SCConv CST Former	14	params['filter']	0.57
CST former	15	params['filter']	0.54
Conv-Conformer	38	params['filter']	14.39
SELD2024	1	params['filter']	0.84

Training and Inference

Training the Model

Train different models:

python train_torch_vanilla.py 1

Monitoring Training and Test

The training and test metrics and losses will be put into the results_audio/ folder, and each unique setting in parameter.py will generate a unique hash path to your process. So does the checkpoints to the models_audio/. You can also use TensorBoard to monitor training progress.

Acknowledgements

Most of our codes come from the DCASE2024 baseline system[1], and the CST-former model code come from the official implementation of CST-former[2]. And the code of SCConv directly comes from the unoffical implementation[3].

References

[1] https://github.com/partha2409/DCASE2024_seld_baseline
[2] Shul Y, Choi J W. CST-Former: Transformer with Channel-Spectro-Temporal Attention for Sound Event Localization and Detection[C]//ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024: 8686-8690.
[3] https://github.com/cheng-haha/ScConv

Contact

For any questions or assistance, please contact:

Name: Silhouette
Email: [[email protected]]

Enhancing 1-Second SELD Performance with Filter Bank Analysis and SCConv Integration in CST-Former

Thank you for your interest in our work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Enhancing 1-Second SELD Performance with Filter Bank Analysis and SCConv Integration in CST-Former

Introduction

Code Outline

Preparation

Prerequisites

Installation Steps

Data Preprocessing

Dataset

Steps

Scripts' Number

Training and Inference

Training the Model

Monitoring Training and Test

Acknowledgements

References

Contact

Enhancing 1-Second SELD Performance with Filter Bank Analysis and SCConv Integration in CST-Former

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
cls_dataset		cls_dataset
images		images
models		models
.gitignore		.gitignore
3_1_dev_split0_multiaccdoa_foa_model.h5		3_1_dev_split0_multiaccdoa_foa_model.h5
6_1_dev_split0_multiaccdoa_mic_gcc_model.h5		6_1_dev_split0_multiaccdoa_mic_gcc_model.h5
README.md		README.md
SELD_evaluation_metrics.py		SELD_evaluation_metrics.py
batch_feature_extraction.py		batch_feature_extraction.py
cls_compute_seld_results.py		cls_compute_seld_results.py
cls_data_augmentation.py		cls_data_augmentation.py
cls_data_generator.py		cls_data_generator.py
cls_feature_class.py		cls_feature_class.py
cls_vid_features.py		cls_vid_features.py
criterions.py		criterions.py
parameters.py		parameters.py
requirements.txt		requirements.txt
test_trained_models.py		test_trained_models.py
torch_run_vanilla.py		torch_run_vanilla.py

way2coder/1Second3DSELD

Folders and files

Latest commit

History

Repository files navigation

Enhancing 1-Second SELD Performance with Filter Bank Analysis and SCConv Integration in CST-Former

Introduction

Code Outline

Preparation

Prerequisites

Installation Steps

Data Preprocessing

Dataset

Steps

Scripts' Number

Training and Inference

Training the Model

Monitoring Training and Test

Acknowledgements

References

Contact

Enhancing 1-Second SELD Performance with Filter Bank Analysis and SCConv Integration in CST-Former

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages