Tools | Eecs 351 Artist Iden

Tools

Frequency Domain Analysis
- The Fast Fourier Transform (FFT) is a tool used for the frequency domain analysis of discrete signals. This algorithm computes the discrete fourier transform, allowing the transformation of time-domain signals into their corresponding frequency domain components. In the frequency domain, the FFT can provide information about the spectral content of a signal, revealing specific frequencies and their magnitudes. This is important for tasks like identifying dominant frequencies, detecting harmonics, and understanding the overall frequency distribution within a signal.

Bandpass Filters
- Bandpass filters are a tool for isolating and manipulating specific frequency ranges within a signal. These filters allow a defined band of frequencies to pass through while diminishing others. Bandpass filters are useful to extract relevant information from any signal in the frequency domain when surrounded by extra noise with frequencies outside of the desired range. They can be applied when in the discrete frequency domain and when given the sampling frequency of the signals by converting a discrete frequency value to an actual frequency value in Hz and then diminishing signals outside of the desired frequency range. They can be achieved through techniques such as finite impulse response (FIR) or infinite impulse response (IIR).

Notch Filters
- Notch filters specifically target and remove a very narrow band of frequencies from a signal while allowing others to pass through unaffected. These filters are particularly useful when there's unwanted noise or interference at a specific frequency. This would be particularly useful in applications like ours where we might want to remove frequencies of specific instrumentals which might have a narrow range of frequencies or simply remove its fundamental frequency. The notch filter diminishes the undesired frequency, leaving the rest of the signal intact.

Spectrogram
- A spectrogram is a visual representation of the spectrum of frequencies in a signal as they vary with time. It provides a way for us to analyze the frequency content of a signal over time. The horizontal axis represents time, the vertical axis represents frequency, and the intensity of each point in the image represents the magnitude or power of the corresponding frequency at a specific time. This was performed so that we could use the frequency vector and the magnitude vector of the complex spectrogram to generate fingerprints.

Fingerprinting
- Fingerprinting is a way to identify songs because each song will generate a unique fingerprint. We followed a method used by researchers at the University of Rochester [1] to generate fingerprints. We started by computing the spectrogram of the audio samples by using the spectrogram MATLAB function. This returns three vectors which are S = the spectrogram matrix (frequency content of audio signal over time), F = the vector of frequencies corresponding to the rows of S, and T the vector of time instants. We then run a fingerprint generation function that takes in S and F. The fingerprint generation function starts by splitting the frequencies from F into 6 logarithmically spaced bands, and then computes the maximum value in each band using S. We then apply a “max filter” using the ordfilt2() function that creates a binary image with 1s and local maxima and 0s everywhere else. We then convert the binary image to a binary vector and is the fingerprint of a song.

Decision Tree
- Decision Trees recursively split the dataset into smaller sets based on the most significant feature at each node, where each node represents a feature, each branch represents a decision made for a specific rule, and each leaf node represents the outcome. Because of these characteristics, we decided that a Decision Tree classifier would be useful for determining which audio features are most important for the classification of an artist.

K-Nearest Neighbors
- K-Nearest Neighbors (k-NN) is a machine learning algorithm that can be used for classification. In k-NN, a data point is classified or predicted based on the majority class or average of its k nearest neighbors in the feature space. The "k" represents the number of neighbors considered, and proximity is determined using a distance metric, commonly Euclidean distance. For classification, the class that occurs most frequently among the k neighbors is assigned to the target data point.

SVM
- Support Vector Machines (SVM) are a type of machine learning model that can be used for classification. These classifiers work by finding the optimal hyperplane in a high-dimensional space that best separates different classes of data points. This line is chosen to maximize the distance between the nearest data points of each class. SVM is effective in handling complex datasets and can adapt well to various scenarios.