Lecture Noter
🌍 Try it here: https://lecture-noter-api-production.up.railway.app/
Motivation
I am on holiday but still want to keep learning, especially when soon onboarding to a new job.
There are many Youtube lectures that are great but watching them without taking notes is not very efficient, i.e., knowledge just goes in one ear and out the other. However, taking notes takes time and effort.
I decided to build a tool for my own learning, to see whether LLMs can help me take notes more efficiently.
I started with a simple pipeline run locally on the CS336 course (Language Modeling from Scratch) by Prof. Percy Liang and the Stanford CS336 team. The output is a blog post following the al-folio blog post format so that I can easily publish it to my blog. The result is quite promising, when I watch the lecture again, that the summary is quite accurate and reflects the key discussions and matches with what I want to capture myself.
I shared the result on LinkedIn and got a bit of attention from the network. Given the little success, I decided to build a web app to make it more accessible to the public. This is where the idea of Lecture Noter came from.
This web app allows users to just paste in the Youtube video URL that they want to take notes on, and the app will render the notes in a blog post format.
At the moment, the app is still in the early stages of development. It does not require any registration or login. It uses my own OpenAI API key untill I cannot afford it. Hopefully, this web app can bring some value to the community.
I also tried to optimize the pipeline to reduce the waiting time for the user. However, the processing time is still quite long, with two main bottlenecks:
- Generating the summary and refining it causes 2-3 minutes average, because of the output size is quite large. (around 12k tokens per input and 4k tokens per output.)
- Downloading the frames (to illustrate the key discussions) takes a long time with around 20-30 frames per 1 hour of video.
Cost estimation:
- 1 hour of video: around 12k tokens per input and 4k tokens per output.
- GPT-5-mini: $0.25 per 1M input tokens and $2.00 per 1M output tokens.
- 1 hour of video API cost: $0.25 * 12k / 1M + $2.00 * 4k / 1M = $0.011
Offline development
The offline development is quite simple. The pipeline is composed of the following steps:
- Get transcript: Get transcript of the lecture by using
Youtube Transcript APIto get the transcript. - Summary: Call OpenAI’s model (default is gpt-5-mini) through API to summarize the transcript, outputing a list of key discussions with detailed explanations and timestamps associated with each discussion.
- Refine: Call model again to refine the detailed explanations so that they are blog post friendly format (highlighting, itemizing, etc.).
- Download frames: With the timestamps, and the Youtube video URL, capture the relevant frames from the lecture video to demonstrate the key discussions. This is done by using the
youtube-dltool to download the video (only specific segments regarding that specific timestamp) and then using theffmpegtool to extract the frames. - Generate mind map: Another call to generate a mind map from the refined detailed content. Output in Mermaid format in order to render it in the blog post.
- Generate blog post: Put all the content, frames together and generate the blog post following the al-folio blog post format.
Parallel processing
One of the main bottlenecks of the pipeline is the time taken to download the video frames to illustrate the key discussions. With an average of 10-20 key discussions, i.e., 10-20 frames to download, each frame takes about 3-5 seconds to download. Therefore, if the pipeline is run sequentially, the user have to wait for long time to get the result. To solve this, I decided to redesign the pipeline to parallelize the frame downloading process. More specifically, after the summary is refined and the mind map is generated, the website will be rendered so that the user can see the result immediately, while the frames are being downloaded in the background. Each frame has its own placeholder in the website and will be replaced with the actual frame when it is downloaded.
Refine the image illustration (Work in progress)
There is a limitation of the current pipeline that the image illustration used to demonstrate the key discussions has been chosen by selecting the middle frame between the start and end of the timestamp of the discussion. Some discussions can be 1 minute long, with a total of 60 frames (1 frame per second), however, only few frames are actually relevant to the discussion while the rest are just the lecturer talking. While the summary is quite accurate, showing these irrelevant frames is quite annoying, dissatisfying, and might bring a negative feeling to the user.
So I decided to modify the pipeline with two steps:
- Step 1: Detect the relativity of the frames to the discussion. If the score is below a threshold, replacing with a generated one. The detection is done by using VLM as a judge - given a frame and a discussion, the VLM will score the relevance of the frame to the discussion.
- Step 2: Generate the image illustration for the discussion.
Online deployment
First step is to develop a web app and run it locally to debug the pipeline. This stage includes the following steps (note that most of the code is generated by Cursor :D):
What have learned
Celery for background tasks
Celery is used to:
- Run the background tasks asynchronously.
- Distributes work - Multiple workers can process jobs in parallel.
- Tracks progress - Updates the status of the job to user, e.g., “50% complete”, “100% complete”, etc.
- Handles failures - Automatic retries, error handling
With out Celery:
User Request → API processes video (takes 5 minutes) → Response
↓
User waits 5 minutes... 😴
With Celery:
User Request → API creates task → Immediate Response ("Processing...")
↓
Background worker processes video
↓
User checks status or gets notification when done ✅
Core concepts of Celery:
-
Task: A function decorated with
@app.taskthat can be executed asynchronously. - Broker: A message queue that stores tasks before workers pick them up.
- Worker: A process that monitors the broker and executes tasks. (Can be multiple workers to distribute the work.)
API → [Task] → Broker (Redis) → Worker picks up task → Executes
Minimal example code:
# tasks.py
from celery import Celery
app = Celery('myapp', broker='redis://localhost:6379')
@app.task
def process_video(video_url):
# Your existing code here
transcript = get_transcript(video_url)
summary = summarize(transcript)
return summary
# api.py
from tasks import process_video
@app.post("/jobs")
def create_job(video_url: str):
task = process_video.delay(video_url) # Returns immediately!
return {"task_id": task.id}
Run it with:
celery -A tasks worker --loglevel=debug
FastAPI
What is FastAPI?
FastAPI is a modern, fast Python web framework for building APIs. Think of it as the “waiter” in a restaurant:
- Takes requests from clients (users/browsers)
- Validates the orders (input data)
- Passes work to the kitchen (your business logic/Celery)
- Returns results to the customer
Basic FastAPI example code:
from fastapi import FastAPI
app = FastAPI()
@app.get("/")
def read_root():
return {"message": "Hello World"}
@app.get("/health")
def health_check():
return {"status": "healthy"}
Run it with:
pip install fastapi uvicorn
uvicorn main:app --reload
Visit http://127.0.0.1:8000 to see the result {"message": "Hello World"}. Visit http://127.0.0.1:8000/health to see the result {"status": "healthy"}.
Cache management
To avoid re-processing the same video multiple times, I designed a simple cache management system using Redis. The cache is used to store the transcript, summary, and frames for each video associated with a model version. When a new video is requested, the system first checks the cache to see if the video-id and ai-model-version are in the cache.