Daniel Liden

Blog / About Me / Photos / LLM Fine Tuning / Notes /

Posts Archive

(2024-03-29) Retrieving Data for the H2o RAG Benchmark

I was looking for a good dataset to use for comparing different models in a RAG application when I found this post on Reddit. It compares a bunch of models on a collection of questions over a set of documents provided by H2O.ai.

I wasn't super interested in the benchmark, but the files (mostly pdfs, one mp3, jpg, other file types) interested me for use in my own testing. This short post shows how to get them using the scripts provided by h2o.ai.

(2024-03-25) Making headings for recurring tasks in org mode

This short post shows how to use the org-clone-subtree-with-time-shift command to make org headings for recurring tasks. I was recently trying to add a five-week class to my org agenda and I didn't want to manually create each heading and add or modify the timestamp. This approach made it very easy.

(2023-12-17) Introduction to Emacs Hooks

Today I was customizing the appearance of org files displayed with org-tree-slide. In particular, I wanted to increase the font size and start Olivetti mode whenever I started org-tree-slide-mode and return everything to normal when I was done. This, I quickly discovered, required the use of hooks. Hooks are not especially complicated, but they are useful and worth taking a few minutes to understand. This post will cover the basics of working with hooks in emacs.

(2023-07-09) YASnippet for Prompt Templates for Chatgpt-Shell

The wonderful chatgpt-shell package by Xenodium lets you interact with the gpt-3.5 and gpt-4 APIs in emacs via a handy shell built on top of comint-mode. It also integrates well with org-mode.

I find that I tend to re-use a few prompt patterns for specific tasks. Yasnippet provides a great way to create prompt templates made up of some fixed component with placeholders for user input. I can easily insert these prompt templates when working with chatgpt-shell to gain easy access to reusable, task-specific prompts. This post describes how to start using Yasnippet for prompt templates for use with chatgpt-shell.

(2023-06-01) Writing on AI and Postgres

Since this start of this year, I've been working on and writing about AI tools for working with Postgres databases. Most of this work has involved finding different ways to integrate ChatGPT (and previously Codex) with other tools and workflows. I wanted to collect and share some of that writing here, as it's related to a lot of the other things I write about on my personal blog.

(2023-03-10) Using the ChatGPT API with Julia Part 2: Defining a Chat Struct

One of the things that makes working with the ChatGPT API a little different from working with, e.g., the davinci-text-003 model api is the need to maintain the history of a given chat session. A Julia Struct containing the chat history, coupled with a function that acts on that Struct, provides a good way to work with the ChatGPT API.

For the basics of working with the ChatGPT API, check out part 1.

(2023-03-04) Using the ChatGPT API with Julia Part 1: the HTTP.jl Library

This brief post shows the basics of using the Julia HTTP library to interact with the OpenAI ChatGPT API, which was made public a few days ago. This post will only include the minimum necessary detail for getting started with the API. Future posts will go into a little more detail on how to send message histories and engage more interactively with the API.

(2022-12-22) Using Quarto Files with Denote

The latest release of Denote (by the inimitable Protesilaos Stavrou) introduced support for custom file types in addition to the defaults, Org, Markdown+YAML, Markdown+TOML, and plain text. This post shows how to add Quarto files (.qmd). Quarto, the successor to R Markdown, is "an open-source scientific and technical publishing system" with support for Python, R, Julia, and Observable. The setup detailed here will allow one to choose the .qmd filetype when creating a new Denote file.

(2022-09-18) Processing a JSON API Response with jq

There are countless ways of processing JSON data and converting it to different formats. Historically, I've used Python and loaded the data into a Pandas Dataframe for processing. This isn't really necessary for simple tasks, though. Sometimes, a lightweight command line tool does the job just fine. Enter jq. jq is "like sed for JSON data." This post walks through an example of downloading data from an API, extracting a few fields based on some conditions, and converting the results to a CSV using jq.

(2022-07-24) Figures and Captions Don't Appear as Expected with Default Export Options

I noticed that some of the formatting on this site was a little off and some of the org-mode components weren't being translated to HTML in quite the way I expected. Fixing this was simple, but finding the solution wasn't. In short, it was necessary to set the org-html-doctype to html5 (the default is xhtml-strict). Furthermore, I set org-html-html5-fancy to t. These ensure the org export process takes advantage of block elements offered in the html5 standard.

(2022-07-19) Basic Plotting in Julia

In this short post, I show one of the many ways of using Julia within emacs org mode, and will describe some of the basic plotting functionality in Julia.

(2022-02-08) Org Mode Headlines in Org Source Blocks

When writing about org mode, one often wants to show what particular org headline look like in terms of formatting, properties, tags, options, etc,. However, even within a babel org source block, an org header will be parsed and exported as a header. We can get around this by prepending the headline with a comma. The comma won't show up when exported: all that is exported is a nicely-formatted example of an org headline.

(2022-01-29) Task Repeaters in Org Mode

I recently started using org-mode to keep track of a few habits (morning meditation, getting some sunlight and exercise before my morning coffee, etc.) and needed to make use of org-mode's calendar features to do so. I've previously set deadlines and scheduled dates for my TODO entries, but have seldom used repeat intervals. My early attempts ( date +1d) worked fine but required some extra steps if I ever missed a day. This post discusses the .+ and ++ -style repeat intervals, which allow more control over what happens when you complete a task after the scheduled date.

(2021-12-11) Org Babel Source Blocks for R

Org Babel is one of the best tools available for literate programming. As a data scientist, I use it as a plain-text alternative to Jupyter notebooks. Org-mode files are much easier to track with version control and don't require the overhead of a browser. There are tradeoffs: Jupyter notebooks handle the display of different types of output (text results, images, interactive figures, etc.) in a way that is both seamless and visually appealing. Displaying figures at all can be a challenge when getting started with org-babel. This post covers the basics of using org-babel for common data science tasks in R.

(2021-12-03) Made with Org-Mode

I finally made a personal site using org-mode's built-in ox-publish exporter.

I've written my personal website with org-mode for years (it is, after all, one of the most reasonable markup languages to use for text). But until this point, I've used Hugo (with the ox-Hugo exporter). It worked fine, but it always seemed just a little bit too complicated for my needs. I wanted to find something where I could basically understand all of the components and where the gap between my org-mode files and the published output was as small as possible. I wanted to focus more on the writing and less on understanding the framework.

(2021-12-02) Resources

Here are some resources to reference for building a simple site with org-mode. I've extensively used the sites listed as models for building the present site and expect to continue to reference them for some time.

(2021-05-14) A Simple PyTorch Model for the Numerai Tournament

This is another one from the archives. It covers how to train a basic PyTorch model for use in the Numerai tournament, at least as it was in May 2021. See the original post here.

(2021-02-13) Mapping Urban Heat by Census Tract in R

Another one from the archives–this is one of my projects from my time at the Guinn Center, and something I very much wish I could have developed further: an analysis of urban heat in Las Vegas.

(2019-06-14) Book Review: Machine Learning Yearning by Andrew Ng

This is the first of a few posts I'm migrating from my old site, which you can still find here. This is a review of Machine Learning Yearning by Andrew Ng.

Emacs 27.1 (Org mode 9.3)