Flatiron weekly progress: Sep30-Oct4

# ironclust v2
– […] run memory benchmark
– […] update spikeforest website benchmark
– [x] plot quality comparison
– [x] update the spikeforest wrapper

# Dan English
– create SNR distribution plot
– compare with others dataset

# paper writing
– jeremy flow chart
– contribute to spikeforest

# misc
– [x] ottawa travel reimbursement

# Computer maintenance
## Ubuntu
[x] VNC viewer installation
– [x] yakuake terminal sudo apt-get install yakuake
– [x] vscode (got stuck, can’t install code on terminal)

## Windows
– [x] Windows 10 install on moneyboxwin
– [x] office 365, TreeSizeFree, KarenReplicator
– [x] Copy 5GB backup drive

## Disk drive
– […] Initialize RAID48GB
– [ ] Copy recordings to 48GB
– [ ] Copy personal files to 48GB
– [ ] Build 60 GB Linux partition, put in recordings
– [x] Build 48 GB backup drive (RAID5), put in all recordings

irc2 development log

auto-merge: using feature RMS instead of waveform correlation

dataset: hybrid_janelia_static
(64ch, 1200s, 72 units, 30KS/s)
fParfor=0Runtime (s):
Detect + feature (s): 132.0s
Cluster (s): 94.0s
Automerge (s): 17.4s
Total runtime (s): 243.4s
Runtime speed x4.9 realtime
memory usage (GiB):
detect(GiB): 0.900
sort(GiB): 0.380
Runtime (s):
Detect + feature (s): 57.5s
Cluster (s): 30.6s
Automerge (s): 19.6s
Total runtime (s): 107.6s
Runtime speed x11.2 realtime
memory usage (GiB):
detect(GiB): 1.090
sort(GiB): 0.482
(4 local workers)
Runtime (s):
Detect + feature (s): 86.9s
Cluster (s): 48.7s
Automerge (s): 14.1s
Total runtime (s): 149.7s
Runtime speed x8.0 realtime
memory usage (GiB):
detect(GiB): 4.192
sort(GiB): 0.577
(20 local workers)
Runtime (s):
Detect + feature (s): 76.1s
Cluster (s): 22.1s
Automerge (s): 10.7s
Total runtime (s): 108.9s
Runtime speed x11.0 realtime
memory usage (GiB):
detect(GiB): 4.169
sort(GiB): 0.560
(20 remote workers)
Runtime (s):
Detect + feature (s): 58.4s
Cluster (s): 19.2s
Automerge (s): 9.3s
Total runtime (s): 86.9s
Runtime speed x13.8 realtime
memory usage (GiB):
detect(GiB): 4.221
sort(GiB): 0.743
**-p gpu=”gpures:2″**
Runtime (s):
Detect + feature (s): 38.2s
Cluster (s): 12.4s
Automerge (s): 9.9s
Total runtime (s): 60.5s
Runtime speed x19.8 realtime
memory usage (GiB):
detect(GiB): 4.174
sort(GiB): 0.334

irc2 post merging using position and amplitude of clusters

use gaussian kernel smoothing (make sure i get half a fall off at half the mindist). normalize by projecting a uniform field and ensure uniform field back.

advantage of this approach is robustness to where the peak site is located in determining the peak location.

gaussian kernel convolved, maximum slope at the minimum separation distance (sigma=d_min)
inferring spike position using PC1 is more precise than using other components
Great study music helping me to focus

irc2 development log

# fixed automerging issue
– Spike indexing was incorrect when extracting trPc 3D array.
– waveform shifting produced comparable result

# todo
– [x] compute rho and delta using parallel resources
– [x] compare performance between irc and irc2
– [ ]add drift correction and compare drift performance

# Runtime comparison
– dataset: static_siprobe\rec_64c_1200s_11
– irc.m: 123s, mean accuracy: 89.8, 62 above .8 accuracy, 1.8GB
– irc2.m (fGpu=1,fParfor=0): 64s, mean accuracy: 89.3, 62 above .8 accuracy, .776GB

# irc2.m speed test (drift correction not implemented yet)
-fGpu=1, fParfor=0: 64s
-fGpu=0, fParfor=0: 389s
-fGpu=0, fParfor=1: 158.6s (20 nodes, local)

# irc2.m test on linux workstation
-fGpu=0, fParfor=0: 292.5s
-fGpu=0, fParfor=1: 133s (20 nodes, remote)
-fGpu=1, fParfor=0: 45s

Losing weight by walking to work and back

2019 Sep 22: 93 KG

action: walked for an hour to get to work. Will do the same on the way back. That’s two hours of walking per day. I will also save $20 a day not taking ferry. I will lose 1KG, save $20, and spend extra 1 hour a day commuting to work. and back. In a month I will be 30 KG lighter and $600 richer.

sep 19 2019 @ flatiron

Start: 10:30 AM
Goals: memory loop plot, data backup, v4.9.5 debug with bapun,

# Memory test status
Still going. the param_set2.prm include cached results so I need to consider file read time, which should be about 1GB/s.

# Bapun v4.9.5 vs v4.9.11 comparison
No obvious difference. Formatting issue is suspected. Run his dataset tomorrow using his own parameter. Also run using makeprm command

# Dan English Dataset library
Not downloaded yet. I should do this after making parforeval command

# Disk backup
New 4-bay disk is setup. Each can hold 40TB (RAID0). My personal data will be RAID5 (30TB) and the recordings will be RAID0 and will be stored in CEPH. Linux gets 20TB of scratch (RAID0) to be managed by LVM. Eventually I will invest in 14TBx4 which gives 12TB extra with RAID5. This will come at $1600 price tag. I can only afford an enclosure (30TB) at this point. I will setup 10TB scrach drive in linux and 10TB in windows. I will keep a copy of the data at home to be used with my Lenovo.

## Final goal
– 40TB Hitachi RAID5 keep at home (enclosure ordered, fill with personal data)
– 40TB WD RAID0 keep @ work (filled with recordings, hardware RAID)
– 40TB WD RAID0 windows enclosure (temp data keeping purpose)
– 20TB Hitach keep in Moneybox (10TB for windows, 10TB for ubuntu)

## Data migration plan
– day0: empty 80TB WD RAID5 tower to 40TB RAID0 Hitachi (recordings) and 20TB Hitachi drives (personal)
– day1: copy RAID0 Hitachi (recordings) to CEPH via Globus (over the weekend, ~30TB)
– day2: bring 4-bay enclosure from home (Monday), add 40TB WD, copy from 40TB Hitachi RAID0 overnight
– day2: Change 40TB Hitachi to RAID5 and copy personal data (20TB Hitachi) overnight
– day3: Take 40TB Hitach RAID5 home, borrow 40TB WD to setup linux RAID

Flatiron workstation LVM RAID setup

# Dell precision linux setup (T7910)
– Dylan suggested RAID5 XFS on the CENTOS workstation. RAID is managed by LVM (logical volume manager).
– The plan is generate the dataset and run the benchmark on this workstation. When my own linux box is cleared then I could clone the setup.
– Dylan disabled the last access time field, a performance hack.
– Currently generating the test dataset. The RAID is being built but the writing to disk is still possible. He recommended softraid.
– The question about why not using CEPH. read/write variability on the workstation. this is the reason why I want a local RAID. I am dealing with a large file read and write and I want to keep all the processing to local.

# Linux thunderbolt
install sudo apt install bolt
restart and mount
Dell workstation has Thunderbolt port available.

# Memory profiling caveat
In Ubuntu, the memory profiling precision above 20MB (MB is 10^6 whereas MiB is 2^20)