Wildlife Acoustics - bioacoustics software Wildlife Acoustics - autonomous field recorders Wildlife Acoustics - bird and frog vocalization classification and identification
  Products   Order On-Line   Downloads   News   About Us   Contact   Links

Song Scope Version 2.0 Performance Analysis

Comparing new Version 2.0 algorithms with Version 1.0

Several improvements have been made to the Song Scope sound classification algorithms with the introduction of version 2.0 in an effort to improve both accuracy (the number of detections that correctly match the desired vocalization relative to the total number of detections) and sensitivity (the number of correct detections relative to the number of actual vocalizations present in the recordings as detected by a human expert).

In order to measure and illustrate these improvements, we refer to our report "Automatic Detection of Cerulean Warblers". In this study, the original version of Song Scope was used to detect Cerulean vocalizations from approximately 250 hours of field recordings collected at several sites in the Allegheney National Forest in 2007. This effort was an extremely difficult challenge for automatic classifiers because Cerulean warblers are faint, subject to a wide range of individual variation, and very similar to vocalizations from other species present in abundance including the Black-throated Blue Warbler and the Chestnut-sided Warbler.

We repeat the exercise of the "Second-Pass Recognizer" as described in this report using exactly the same inputs (e.g. training recordings, field recordings, and parameters) using the new version 2.0 algorithms. The one other change was reducing the dimensionality of the feature vectors used from 16 to 5 as we have found that smaller feature vectors appear to perform better when using the version 2.0 software.

Both versions of Song Scope use "minimum quality" and "minimum score" values to filter the results. These parameters provide balance between sensitivity and accuracy. By increasing the minimums, only stronger matches will be generated resulting in higher accuracy at the cost of lower sensitivity. By decreasing the minimums, more matches will be generated resulting in higher sensitivity at the cost of lower accuracy.

Estimated sensitivity is based on the assumption that there are 5,369 detectable Cerulean vocalizations contained in the 250 some hours of field recordings. This count is derived as follows: In the original report, the 5-minute recording made at 8AM on each day at each location was manually reviewed resulting in a count of 118 detectable Cerulean vocalizations. The version 1.0 algorithms detected 35 of these vocalizations in the same sample, and a total of 1,552 ceruleans in the entire set of 250 hours. By extrapolation, we can estimate a total count of 5,232 total vocalizations. The version 2.0 algorithms detected 53 of these vocalizations in the same sample, and a total of 2,473 ceruleans in the entire set of 250 hours. By extrapolation, we can estimate a total count of 5,506 total vocalizations. These two estimates are within 5% of each other, and we split the difference to come up with the final estimate of 5,369 detectable Cerulean vocalizations.

The chart above graphs several permutations of the minimum quality and score filter values for both versions. Estimated sensitivity is shown as the percentage of correct detections out of the estimated total 5,369 vocalizations present. Accuracy is shown as the percentage of correct detections out of total detections reported. Only the best permutations are shown based on the sum of the accuracy and estimated sensitivity scores, and a 6-order polynomial trendline is shown for each version.

The version 2.0 algorithms clearly outperform the version 1.0 algorithms in both sensitivity and accuracy. Furthermore, the relatively tighter clustering of version 2.0 filter permutations illustrates that the scores and quality values more accurately reflect the probability of an accurate detection.

Note that this chart is a comparison of the two algorithms under very specific and challenging circumstances. While we expect the version 2.0 algorithms to outperform version 1.0 in nearly all cases, the absolute and relative performance of these algorithms may vary considerably under different circumstances.