2021-09-13: Development Meeting

Kacper, Mike L, Mike H, Craig, Tim

Agenda

  • Updates
  • Recorded Runs UI preview ([name=Mike L])
    • WIP - rename/remove buttons exist, but need to use different endpoints at the top-level (see existing rename/delete version)
    • WIP - rename/remove should not be allowed outside of the top level
    • BUG - Names shouldn’t be required. looking into backend solution. Possibly a bug in Girder itself and their swagger parsing/generation?
    • Would be nice to show Run Status next to each Run - see RunStatus

Updates

  • Kacper

    • Uploading generic data/zip files as involatile data - https://github.com/whole-tale/girder_wholetale/pull/502
    • https://hackmd.io/KQ9TzJ7zSrqzaeYGX9ClSQ
  • Mike L

    • Updated Recorded Runs PR to extend Run > Files, added screenshots
      • Added “Tale Versions” nav to view / browse versions
      • Added “Recorded Runs” nav to view / browse runs
      • WIP for rename/remove
      • Added to header, but need to fix file path
      • Added buttons to launch a new Recorded Run
        • Remove name dialog once param is optional
  • Mike H

  • Craig

    • Supplement planning meetings
    • StataBuildPack draft changes
      • Desktop environment
      • install.do support
    • Updated staging for supplement
  • Tim

    • Now have bpftrace program that monitors the full lifecycle of Linux processes. It’s small and looks like an awk script:

      #!/usr/bin/env bpftrace
      
      BEGIN {
          printf("------------------ Starting bpftrace -------------------\n");
          printf("\n");
          printf("  TIME(us)    EVENT   PROCESS  PROGRAM NAME     DETAILS\n");
          printf("  --------    -----   -------  ------------     -------\n");
      }
      
      tracepoint:syscalls:sys_exit_clone /args->ret != 0/ {
          printf("%10u    %-6s %8d  %-15s  child=%d\n", elapsed/1000, "start", 
                 pid, comm, args->ret);
      }
      
      tracepoint:syscalls:sys_enter_execve {
          printf("%10u    %-6s %8d  %-15s  argv=", elapsed/1000, "exec", pid, comm); 
          join(args->argv); 
      }
      
      tracepoint:sched:sched_process_exit {
          printf("%10u    %-6s %8d  %-15s \n", elapsed/1000, "exit", pid, comm);
      }
      
      END {
          printf("\n------------------ Exiting bpftrace -------------------\n");
      }
      
    • Output when monitoring the date command issued at the bash prompt:

      $ time sudo ./scripts/watch_process_lifecycle.bt 
      Attaching 5 probes...
      ------------------ Starting bpftrace -------------------
      
      TIME(us)    EVENT   PROCESS  PROGRAM NAME     DETAILS
      --------    -----   -------  ------------     -------
       1599700    clone    121947  bash             child=128934
       1599930    exec     128934  bash             argv=date
       1601748    exit     128934  date   
      ------------------ Exiting bpftrace -------------------
      
    • Less than 30% overhead vs a 25x slowdown for ReproZip on the Jupyter notebook I’m using to stress test recording of traces.

    • Negligible overhead if don’t monitor process exits (ReproZip does not).

    • Is compiled and runs in the Linux kernel–so can run the bpftrace program in one container and processes you want to monitor in another–no need to give special capabilities to the latter (i.e. the WT user’s tale).

    • Also have a working script that shows how to start a bpftrace program in background, wait for it start recording, run some other script, and then shutdown the bpftrace program–will still see all targeted events in the kernel, not just child processes.

    • Can collect trace in kernel and output back to user space afterwards; can output as triples directly.

    • Will look into recording I/O this week; then interprocess communication (e.g. pipes between parent and child processes such as a Jupyter cell gathering stdout from an external command).