Previous Page |
Next Page |
See the Sample Rate control on the mixer control panel.
You should first decide what sample rate is most appropriate for your application. While many recordings are made at CD-audio quality 44,100 samples per second, this is often not the best choice.
The sample rate should be at least twice the frequency of the highest dominant frequency in a vocalization. However, we recommend that you choose a sample rate that is not much higher than this. While faster sampling rates can resolve high frequency harmonics of a vocalization, these harmonics usually offer only redundant information and are not particularly helpful in identification. In addition, the limited dynamic range available in noisy field environments render higher frequency harmonics undetectable compared to high-quality recordings made by directional microphones under ideal conditions. In addition, given the limited frequency resolution of a given FFT size, higher sampling rates result in less frequency detail in the lower frequencies that may in fact be more important for recognition.
There are computational advantages to choosing a sample rate that is an integer factor of the source sample rate. In other words, a recording made at 44,100 samples per second can be more efficiently reduced to 22,050 (divided by 2) or 11,025 (divided by 4) rather than being converted to 16,000 samples per second.
For amphibian monitoring, most frogs vocalize under 4,000Hz, so sampling rates over 8,000Hz are recommended. If the source recording is sampled at 44,100 samples per second, we would recommend using a sample rate of 11,025 samples per second.
For birds, most species vocalize well below 10,000Hz, so a sample rate of 22,050 samples per second is sufficient. That being said, many birds such as owls and doves have vocalizations under 1,000Hz, so sampling rates of only 2,000Hz would be acceptable for these species.
See the FFT Size and FFT Overlap controls on the spectrogram control panel.
After adjusting the sample rate as described above, you should next choose the optimum FFT parameters. The best way to do this is by viewing the spectrogram of a specific vocalization and see how changing the FFT size affects the spectrogram plot. Larger FFT sizes will show more frequency resolution at the expense of detail on the time axis while smaller FFT sizes will show more detail on the time axis at the expense of frequency resolution. For example, for vocalizations with a rapid pulsing "trill", smaller FFT sizes may be better to resolve the individual pulses.
See the Frequency Minimum and Frequency Range controls on the spectrogram control panel.
After adjusting the FFT parameters as described above, you should next choose the optimum minimum frequency. We recommend that you use the Logarithmic Scale view of the spectrogram plot because the minimum frequency plays a very important role in determining the log frequency scale used by the recognizer.
It is also important to understand that background noise is generally stronger in lower frequencies and will corrupt a signal making it difficult to recognize accurately.
We recommend that you adjust the minimum frequency as high as possible and just below the lowest frequency component of the vocalization of interest. It is best to do this while observing vocalizations in the spectrogram plot using the logarithmic scale view.
After setting the minimum frequency, you can then adjust the frequency range to just above the highest frequency component of the vocalization. The combination of these two controls sets the range of frequencies that the recognizer will consider effectively eliminating background noise sources in lower frequencies or competing signals (from other species) in higher frequencies.
See the Background Filter control on the spectrogram control panel.
We recommend that you always enable the background filter. A setting of one second is best for most applications.
The most important parameter in signal detection is the Dynamic Range control on the detector control panel. The dynamic range control sets a limit on how much of the signal energy (in decibels measured relative to the peak signal) will be used in comparing waveforms.
The dynamic range should be matched with the expected signal-to-noise ratio of the field recordings to be analyzed. If the dynamic range is set too high (e.g. much higher than the signal-to-noise ratio), then Song Scope will be looking for spectral details that are lost in the noise in actual field recordings resulting in poor recognition performance. On the other hand, if the dynamic range is set too low, spectral details important to accurate classification may not be considered.
The dynamic range setting is used in conjunction with the other signal detection controls to classify portions of a signal into syllables and inter-syllable gaps in a song. Using the Logarithmic Scale with Signal Detection view of the waveform plot, you can visually see the effects of changing these controls while viewing a specific vocalization. We recommend that you use this view and adjust the settings appropriately while keeping in mind that the dynamic range should also be related to recordings made under actual field conditions.
The dynamic range setting determines how much of the signal's frequency components are considered. The effects can be seen by using the Logarithmic Scale with Signal Normalization view of the spectrogram plot.
We recommend that you use several different recordings for training data representative of the different variations common in a particular vocalization.
It is common to have access to very high quality recordings for training data, such as those made with parabolic reflectors. However, open microphone field recordings, such as those made by unattended field recorders like Song Meter, will sound different. In addition to the dynamic range paramter discussed above, you may also want to reduce the model resolution by lowering the Maximum Resolution value. In our experience, a value of 8 is good in these conditions. In addition, you may find that increasing the Maximum Syllable and Maximum Syllable Gap will help. We recommend setting the detection parameters to work best for the recordings you wish to scan, rather than for the higher quality training data. A little trial and error goes a long way to find the best parameters for the job.
When the model is built, pay attention to the "Cross training" percentage and standard deviation displayed in the "Recognizer Information" window. A low score (e.g. < 50%) or a large standard deviation (e.g. > 15%) may indicate that the generated model is not expected to perfrom well. In this case, you may wish to try higher values for the Maximum Complexity or different values (larger and smaller) for the Maximum Resolution parameters. You may also try breaking up the training data into smaller subclasses.
Beware that a high-scoring model is not necessarily a good discriminating model as it might simply be matching things too easily and could result in high scores (and false positives) for incorrect vocalizations as well.
Previous Page |
Next Page |