Error in last step of generating predictions

Hello! I recently updated vak to v0.8.0 and TweetyNet to v0.9.0 and have encountered a new error (that I did not get with my previous installs) at the very last step of using the predict method of vak.

Here is the full traceback:

Traceback (most recent call last):
  File "C:\Users\Nerissa\anaconda3\envs\vak-env4\Scripts\vak-script.py", line 9, in <module>
    sys.exit(main())
  File "C:\Users\Nerissa\anaconda3\envs\vak-env4\lib\site-packages\vak\__main__.py", line 48, in main
    cli.cli(command=args.command, config_file=args.configfile)
  File "C:\Users\Nerissa\anaconda3\envs\vak-env4\lib\site-packages\vak\cli\cli.py", line 49, in cli
    COMMAND_FUNCTION_MAP[command](toml_path=config_file)
  File "C:\Users\Nerissa\anaconda3\envs\vak-env4\lib\site-packages\vak\cli\cli.py", line 18, in predict
  File "C:\Users\Nerissa\anaconda3\envs\vak-env4\lib\site-packages\vak\cli\predict.py", line 50, in predict
    core.predict(
  File "C:\Users\Nerissa\anaconda3\envs\vak-env4\lib\site-packages\vak\core\predict.py", line 244, in predict
    seq = crowsetta.Sequence.from_keyword(
  File "C:\Users\Nerissa\anaconda3\envs\vak-env4\lib\site-packages\crowsetta\sequence.py", line 382, in from_keyword
    labels) = cls._validate_onsets_offsets_labels(onsets_s,
  File "C:\Users\Nerissa\anaconda3\envs\vak-env4\lib\site-packages\crowsetta\sequence.py", line 224, in _validate_onsets_offsets_labels
    raise ValueError('must provide either onset_inds and offset_inds, or '
ValueError: must provide either onset_inds and offset_inds, or onsets_s and offsets_s

I did a little bit of digging and determined that audio segments without any calls are tripping up the crowsetta conversion process. Specifically, at line 239 of the vak predict.py file, there is a check for empty segments that is going awry because labels is turning up an empty list rather than a None object.

if labels is None and onsets_s is None and offsets_s is None:
                # handle the case when all time bins are predicted to be unlabeled
                # see https://github.com/NickleDave/vak/issues/383
                continue
seq = crowsetta.Sequence.from_keyword(
                labels=labels, onsets_s=onsets_s, offsets_s=offsets_s
            )

Inspecting the variables at this step shows that onset_s and offset_s are None as expected, but not labels.

(Pdb) print(len(labels))
0
(Pdb) print(len(onsets_s))
*** TypeError: object of type 'NoneType' has no len()
(Pdb) print(len(offsets_s))
*** TypeError: object of type 'NoneType' has no len()

Not sure how I can adjust my data to avoid this error—what am I missing?

Thanks!

Hi @nhoglen really sorry you’re having this issue, it does sound like a bug.

I hate to ask you to do more work but could you file a bug report here?

That way we can manage the bugfix through GitHub, and make sure you get credit for catching it.

All the information you’ve provided is really helpful, I think you can mostly just copy it there.

Not sure how I can adjust my data to avoid this error—what am I missing?

I think we probably need to fix the function to handle this edge case instead of asking you to do anything with your data.

If there’s any way you can share a small sample of data to make it easier to reproduce the bug, that would be really helpful. You could share it with me by email (nicholdav at gmail), e.g. as a Google Drive or Dropbox link, or by attaching as a zip to the GitHub issue. I’d basically need the results dir you’re using in your predict file (with the checkpoint, spect scaler, labelmap, etc) as well as the dataset you’ve prepared for predictions (both the files and the csv generated by prep). Just the one file that causes the bug should be enough if you’d rather not share a giant dataset.

Let me know if that makes sense. I can fix it this weekend.

Thank you!

Sounds good! Thanks for your help. I’ll get on submitting it on GitHub.

1 Like