Hi all,
I installed vak on linux and everything worked fine with the bengalese finch test set from the tutorial. Now I’m trying my own recordings but I think I’m running into memory problems. During vak prep
It starts as usual, but crashed with the following error after about 10-20 seconds:
File "/home/sita/anaconda3/envs/tweetyS4/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
(full error message below)
Thinks I tried:
- setting device in toml both to ‘cpu’ or ‘gpu’
- changing number of workers (between 1 and 8 if I remember correctly)
- reducing traindur, valdur en testdur, and nr of epchs, but I think these are only used in the training step right?
- using shorter file. with only 1 short file (5 minutes, 59,3 MB) vak prep does seem to work, so I presume long files is the problem. The original files are about 30 min and about 330-350 MB per file)
As a workaround, I could write a script split all files into shorter ones but given this issue https://github.com/dask/dask/issues/8506 I thought you might have ideas for a better solution? Thanks of course for already putting effort in this David!
My vak version is 1.0.3, so more recent that the issue post above
System info: Linux Min 21.3, Ubuntu 22.04
Graphics card: GeForce GTX 1650
Any ideas are welcome.
Thanks in advance!
Sita
full error message:
2024-11-24 16:41:28,443 - vak.prep.frame_classification.frame_classification - INFO - vak version: 1.0.3
2024-11-24 16:41:28,443 - vak.prep.frame_classification.frame_classification - INFO - Will prepare dataset as directory: /media/sita/sth8T/hoornraven/hoornraven_Tweetynet/tweetynet_hornbills_vanlaptopWin/tweetynet_test_hornb/recording120324M/train/prep_out20241124/used-vak-frame-classification-dataset-generated-241124_164128
2024-11-24 16:41:28,647 - vak.prep.spectrogram_dataset.prep - INFO - making array files containing spectrograms from audio files in: /media/sita/sth8T/hoornraven/hoornraven_Tweetynet/tweetynet_hornbills_vanlaptopWin/tweetynet_test_hornb/recording120324M/train/used
2024-11-24 16:41:28,647 - vak.prep.spectrogram_dataset.audio_helper - INFO - creating array files with spectrograms
[ ] | 0% Completed | 21.83 sms
Traceback (most recent call last):
File "/home/sita/anaconda3/envs/tweetyS4/bin/vak", line 10, in <module>
sys.exit(main())
^^^^^^
File "/home/sita/anaconda3/envs/tweetyS4/lib/python3.12/site-packages/vak/__main__.py", line 49, in main
cli.cli(command=args.command, config_file=args.configfile)
File "/home/sita/anaconda3/envs/tweetyS4/lib/python3.12/site-packages/vak/cli/cli.py", line 54, in cli
COMMAND_FUNCTION_MAP[command](toml_path=config_file)
File "/home/sita/anaconda3/envs/tweetyS4/lib/python3.12/site-packages/vak/cli/cli.py", line 28, in prep
prep(toml_path=toml_path)
File "/home/sita/anaconda3/envs/tweetyS4/lib/python3.12/site-packages/vak/cli/prep.py", line 134, in prep
_, dataset_path = prep_module.prep(
^^^^^^^^^^^^^^^^^
File "/home/sita/anaconda3/envs/tweetyS4/lib/python3.12/site-packages/vak/prep/prep_.py", line 194, in prep
dataset_df, dataset_path = prep_frame_classification_dataset(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sita/anaconda3/envs/tweetyS4/lib/python3.12/site-packages/vak/prep/frame_classification/frame_classification.py", line 276, in prep_frame_classification_dataset
source_files_df: pd.DataFrame = get_or_make_source_files(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sita/anaconda3/envs/tweetyS4/lib/python3.12/site-packages/vak/prep/frame_classification/source_files.py", line 144, in get_or_make_source_files
source_files_df = prep_spectrogram_dataset(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sita/anaconda3/envs/tweetyS4/lib/python3.12/site-packages/vak/prep/spectrogram_dataset/prep.py", line 151, in prep_spectrogram_dataset
spect_files = audio_helper.make_spectrogram_files_from_audio_files(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sita/anaconda3/envs/tweetyS4/lib/python3.12/site-packages/vak/prep/spectrogram_dataset/audio_helper.py", line 247, in make_spectrogram_files_from_audio_files
spect_files = list(bag.map(_spect_file))
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sita/anaconda3/envs/tweetyS4/lib/python3.12/site-packages/dask/bag/core.py", line 1488, in __iter__
return iter(self.compute())
^^^^^^^^^^^^^^
File "/home/sita/anaconda3/envs/tweetyS4/lib/python3.12/site-packages/dask/base.py", line 372, in compute
(result,) = compute(self, traverse=False, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sita/anaconda3/envs/tweetyS4/lib/python3.12/site-packages/dask/base.py", line 660, in compute
results = schedule(dsk, keys, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sita/anaconda3/envs/tweetyS4/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.