- Parallel Computing Course - Stanford CS149, Fall 2023
- Performance-Aware Programming Series by Casey Muratori
- Algorithms for Modern Hardware
- Computer Systems: A Programmer's Perspective, 3/E - by Randal E. Bryant and David R. O'Hallaron, Carnegie Mellon University
- Performance Engineering Of Software Systems - am MITOCW course
- Parallel Programming 2020 by NHR@FAU
- Cpu Caches and Why You Care - by Scott Meyers
- [Optimizing a ring buffer for throughput](https://rig
| /* | |
| * Creator: Naman Dixit | |
| * Notice: © Copyright 2024 Naman Dixit | |
| * License: BSD Zero Clause License | |
| * SPDX: 0BSD (https://spdx.org/licenses/0BSD.html) | |
| * Language: -*- objective-c -*- | |
| */ | |
| /* | |
| * This is a minimal Objective-C runtime designed for easy embeddability and DLL-based hot-reloading. |
3.5 fps, Paperwhite 3
@adtac_
mobileread.com is your best resource here, follow the instructions from the LanguageBreak thread
I didn't really follow the LanguageBreak instructions because I didn't care about most of the features + I was curious to do it myself, but the LanguageBreak github repo was invaluable for debugging
Every once in a while I investigate low-level backend options for PL-s, although so far I haven't actually written any such backend for my projects. Recently I've been looking at precise garbage collection in popular backends, and I've been (like on previous occasions) annoyed by limitations and compromises.
I was compelled to think about a system which accommodates precise relocating GC as much as possible. In one extreme configuration, described in this note, there
This worked on 14/May/23. The instructions will probably require updating in the future.
llama is a text prediction model similar to GPT-2, and the version of GPT-3 that has not been fine tuned yet. It is also possible to run fine tuned versions (like alpaca or vicuna with this. I think. Those versions are more focused on answering questions)
Note: I have been told that this does not support multiple GPUs. It can only use a single GPU.
It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in llama.cpp. The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.
- Clone llama.cpp from git, I am on commit
08737ef720f0510c7ec2aa84d7f70c691073c35d.
I recently discovered a relatively obscure algorithm for calculating the digits of pi: https://en.wikipedia.org/wiki/Gauss–Legendre_algorithm.
Well, at least obscure compared to Chudnovsky's. Wikipedia notes that it is "memory-intensive" but is it really?
Let's compare to the MPFR pi function:
function gauss_legendre(prec)
setprecision(BigFloat, prec, base=10)
GC.enable(false)As part of a holiday D&D one-shot session where Santa Claus's toy factory had been sabotaged, our dungeon master presented to us, a group of Christmas elves, a riddle to solve.
9 cards, labeled with the names of Santa's reindeer were presented to us. The instructions indicated that we had to find the order reindeer were in, according to this riddle:
Vixen should be behind Rudolph, Prancer and Dasher, whilst Vixen should be in front of Dancer and Comet. Dancer should be behind Donder, Blitzen and Rudolph. Comet should be behind Cupid, Prancer and Rudolph. Donder should be behind Comet, Vixen, Dasher, Prancer and Cupid. Cupid should be in front of Comet, Blitzen, Vixen, Dancer and Rudolph. Prancer should be in front of Blitzen, Donder and Cupid. Blitzen should be behind Cupid but in front of Dancer, Vixen and Donder. Rudolph should be behind Prancer but in front of Dasher, Dancer and Dond
| /* | |
| This started from coding ternary heapsort while on my lunch break. | |
| It then occured to me that the heapify algorithm could be easily | |
| extended to any m-ary heap (except of course, a 1-ary heap(list?)) | |
| as shown heapify creates a pentary-heap. changing the #define MARY value | |
| will set the branching value of the heap. | |
| */ | |
| #include <stdio.h> | |
| #include <stdlib.h> |