1 min readApr 13, 2020
Thanks for the insights in this blog post. I have been playing getting a similar setup to run on spot instances, I wonder if you have done so.
Documentation about tensorflow and spot training says aws will automagically pick up training from checkpoint without any glue code, does this still apply when using GDV? I feel a bit confused by not finding any documentation on how sagemaker picks up from a checkpoint.