Notes

Updates from me with curated content from across the web.

Grok-1

X.ai just open-sourced Grok-1, a 314-billion parameter mixture-of-experts model.

The repo has a Torrent magnet link to download the checkpoint, which is over 300GB. You’ll probably need a handful of A100 80GBs, which are $6/hr a piece right now according to Brev.

This is a base model, so it’s not fine-tuned for a specific task like chat or instruct; but those will show up on HuggingFace soon, I’m sure. Elon is my boy again 🤗

2024-03-17•ai

Gemma

Google released Gemma, a new family of open-weight models related to Gemini. Hugging Face also published their own blog post about it.

Initially there are 2B and 7B sizes, with 8k context, available in base and instruct-tuned variants. The latter (instruct) are already available in Perplexity Labs to play around with.

Google also open-sourced the following repositories on GitHub:

gemma: reference implementation in JAX (Flax)
gemma_pytorch: PyTorch implementation
gemma.cpp: standalone C++ inference engine

Finally, there are official guides on how to use Gemma with KerasNLP including how to fine-tune using LoRA.

2024-02-21•ai

verto.sh

verto is a web app from Luca Cavallin. It’s a tool for discovering open source issues to work on. You can filter by language, tag, as well as search.

The available issues are from a curated selection of repos filtered by labels. Inclusion criteria is in the README.

My New Year’s resolution every year is to contribute to OSS more, so this could be exactly what I’ve been looking for. Beyond the utility, the design is amazing. It’s built with Tailwind on Next.js using octokit.

2024-01-30•dev

Preordered Rabbit R1

I bought a Rabbit R1.

It’s a handheld device designed by Teenage Engineering. The founder, Jesse Lyu, sold his previous company to Baidu.

I thought it was just a pocket LangChain or a voice controller for Perplexity, but it’s actually using a proprietary Large Action Model.

I tried not to, but Perplexity offered 1 year of Pro for the first 100,000 orders, so I couldn’t refuse.

We're thrilled to announce our partnership with Rabbit: Together, we are introducing real-time, precise answers to Rabbit R1, seamlessly powered by our cutting-edge PPLX online LLM API, free from any knowledge cutoff. Plus, for the first 100,000 Rabbit R1 purchases, we're… pic.twitter.com/hJRehDlhtv
— Perplexity (@perplexity_ai) January 18, 2024

I won’t get it until July though…

2024-01-20•hw

Pocketbase

Pocketbase is a single-file binary written in Go with:

an embedded SQLite database with realtime subscriptions
file handling with static serving
user administration and email
a REST API with admin dashboard
auth with basic, JWT, and OAuth2
JS and Dart client SDKs

You can also use it as a framework to build on top of in Go.

2024-01-16•dev

Hugging Face Tasks

Tasks is a page on the 🤗 website. They published a package a couple of months ago related to it, so it’s fairly new.

From the README:

The Task pages are made to lower the barrier of entry to understand a task that can be solved with machine learning and use or train a model to accomplish it. It’s a collaborative documentation effort made to help out software developers, social scientists, or anyone with no background in machine learning that is interested in understanding how machine learning models can be used to solve a problem.

Check out the Text Classification page as an example. There is a video explanation, a README with examples of the task, and even the metrics used to evaluate the models for that task. There’s also links to 🤗 resources like Models, Datasets, Spaces, Autotrain, and Endpoints.

2024-01-15•ai

Mistral API

Got my beta invite for Mistral’s new API, la plateforme. Here’s a current pricing table with OpenAI and Perplexity for comparison:

API	Model	$ / 1M tokens (in)	$ / 1M tokens (out)
Mistral	Medium	€2.50	€7.50
Mistral	Small (8x7B)	€0.60	€1.80
Mistral	Tiny (7B)	€0.14	€0.42
OpenAI	GPT-4	$10.00	$30.00
OpenAI	GPT-3.5	$1.00	$2.00
Perplexity	70B	$0.70	$2.80
Perplexity	34B	$0.35	$1.40
Perplexity	7B	$0.07	$0.28

The noteworthy inclusion here is Mistral’s new “medium” model, currently one of the top models on the arena.

2024-01-14•ai

Llamafile: Single-file portable LLMs

Incredible project from Justine Tunney and Mozilla. A llamafile is a compiled LLM, weights and all. It uses Georgi Gerganov’s llama.cpp compiled with Justine’s Cosmopolitan library. The resulting binaries are APEs (actually portable executables).

Because it’s based on llama.cpp, it aso ships with the web UI from Tobi’s PR. Aside from that being one of my favorite PRs of all time (I worked at Tobi’s company), it’s a pretty interesting read on how to jam a web app into a C program.

The embedded server also provides an OpenAI-compatible REST API for local development.

Justine has 🤗 repos with compiled llamafiles. Note the resolve path segment in the URL. This resolves to an AWS CloudFront URL so large files download fast. Here’s how to run LLaVA:

# download to ~/.local/bin/llava and make executable
wget -O ~/.local/bin/llava https://huggingface.co/jartine/llava-v1.5-7B-GGUF/resolve/main/llava-v1.5-7b-q4.llamafile
chmod +x ~/.local/bin/llava

# localhost:8080
llava

On Windows/WSL2 it’s slightly different. You’re limited to 4GB models like the quantized LLaVA 7B unless you compile your own and use external weights.

Note that you probably need a working CUDA and cuDNN environment first.

# if you see "APE is running on WIN32 inside WSL", one of these should fix it
sudo sh -c 'echo -1 > /proc/sys/fs/binfmt_misc/WSLInterop'
sudo sh -c 'echo -1 > /proc/sys/fs/binfmt_misc/WSLInterop-late'

# set GPU layers to 35
llava -ngl 35

Justine published Bash one-liners you can try with the llamafiles.

Be sure to also check out the impressive Ollama project.

2024-01-09•ai

CLI Guide

I found this one-page guide to writing command-line interfaces (CLIs) while working on my dotfiles. It’s from the creators of docker-compose at the easy-to-remember clig.dev. From the foreward:

Inspired by traditional UNIX philosophy, driven by an interest in encouraging a more delightful and accessible CLI environment, and guided by our experiences as programmers, we decided it was time to revisit the best practices and design principles for building command-line programs.

It’s written like a nice-to-read book. The first half, Philosophy, would be appreciated by anyone who enjoys good technical writing. If you want to get into best practices for things like telemetry and signal handling, then the second half, Guidelines, is for you.

2023-12-29•dev

Perplexity.ai

Perplexity is a new AI startup. Their app combines online search with LLMs similar to how ChatGPT uses Bing. A cool feature is being able to pick which LLM you want to use. OpenAI, Anthropic, or Perplexity’s in-house models are available.

They also have a free playground at labs.pplx.ai where you can experiment with open-weight models like Llama and Mistral.

Their Pro plan costs the same as ChatGPT Plus ($20/mo), except it includes $5/mo of API credits (millions of tokens). I got two months of Pro for free (see below).

We've made incredible progress, thanks to you all who have trusted us and provided feedback and support! To thank you, we're offering: Two Months of Perplexity Pro, for free: https://t.co/ukw5EMQaMl or use code HOLIDAYS23 in the next 10 days. Happy Holidays! pic.twitter.com/IWcsR4YBTh
— Perplexity (@perplexity_ai) December 22, 2023

2023-12-23•ai