Hi vocalpy team,
Great work on developing the toolkits! Thank you so much for making them publicly accessible!
I’ve been using vak to train TweetyNet model on my own vocalization data for the annotation task. I’m a little confused about the results. Seems that the syllable accuracy is not correctly calculated by the vak eval function. Here is one example output from vak eval:
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_acc: 0.85201
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_acc_tfm: 0.85201
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_levenshtein: 45.52941
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_levenshtein_tfm: 45.52941
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_segment_error_rate: 0.78289
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_segment_error_rate_tfm: 0.78289
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_loss: 0.52653
If I understand the results correctly, my model is able to achieve 85% frame accuracy, but the syllable accuracy is pretty bad. However, if I use vak predict to generate predicted labels, the results weren’t that bad. I compared the predicted labels to the ground-truth labels and calculated the levenshtein distance myself using the metrics.Levenshtein() function, the average syllable error is only 26.8%, instead of 78.2% as claimed by the vak eval function.
I’ve been thinking about this for a while, but couldn’t figure it out why the syllable error rate is so different between vak predict and eval. Any thoughts?
I’m using the ‘simple-seq’ format and only have single-char labels.
Thank you!
Zhilei