Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-23528

Look into slow pull times for images at the LDF

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: Notebooks
    • Labels:
      None

      Description

      We are seeing consistent timeouts of the image pulls on nublado running at the LDF. The timeout is 15 minutes. That is quite a bit longer than we expect from performance we've seen from GKE.

      Characterize how long things are taking and how long we think they should take.

        Attachments

          Activity

          Hide
          krughoff Simon Krughoff added a comment -

          This seems to have been a problem with my environment. I cleared my .local and respawned successfully. Obviously, it would have been better to try thing incrementally to see for sure whether it was spawning on a different node or cleaning up my environment was the culprit, but I don't think we have enough information to do forensics like that at this time.

          Adam Thornton and Frossie Economou do you agree with that assessment?

          Show
          krughoff Simon Krughoff added a comment - This seems to have been a problem with my environment. I cleared my .local and respawned successfully. Obviously, it would have been better to try thing incrementally to see for sure whether it was spawning on a different node or cleaning up my environment was the culprit, but I don't think we have enough information to do forensics like that at this time. Adam Thornton and Frossie Economou do you agree with that assessment?
          Hide
          athornton Adam Thornton added a comment -

          It seems really weird that it would time out, but if you had something in .local that was somehow unsetting the token for communication back to hub, that could have caused a successful container spawn that never managed to check in with the Hub and say "I'm up and running."

          Show
          athornton Adam Thornton added a comment - It seems really weird that it would time out, but if you had something in .local that was somehow unsetting the token for communication back to hub, that could have caused a successful container spawn that never managed to check in with the Hub and say "I'm up and running."
          Hide
          krughoff Simon Krughoff added a comment -

          I'm happy to try other things including spawning multiple times. Is that a worthwhile exercise?

          Show
          krughoff Simon Krughoff added a comment - I'm happy to try other things including spawning multiple times. Is that a worthwhile exercise?

            People

            Assignee:
            krughoff Simon Krughoff
            Reporter:
            krughoff Simon Krughoff
            Reviewers:
            Frossie Economou
            Watchers:
            Adam Thornton, Frossie Economou, Simon Krughoff
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: