37 min

Creation of GoodNotes LM

Published on 2025 05 10·...

Article Details

AI Insights

In this story

Our Client

Goodnotes is a leading digital note taking platform used by millions of people worldwide. Launched in 2011, Goodnotes started as an improvement to physical paper notes — introducing the ability to take handwritten digital notes, search handwritten text, and organize everything into a digital library.

What We Did

Over the past 6 weeks, we created a proof-of-concept document analysis and outline generation testing platform.

Meet the Team

Our amazing team

Timeframe

January — March 2025

Tools

Design: Figma, Jitter

Development: Next.js, Vercel AI SDK, Pytesseract

Workflow: Notion, Slack, Github

The Product

Disclaimer: This is a student collaboration project created by CodeLab at UC Davis in partnership with Goodnotes. Not an official Goodnotes product.

Handoff Document

Watch Demo

Live Site

Who are our users?

Our target audience was predominately, but not limited to , college students and everyday notetakers. We decided to do a survey to help us understand highlights and pain points that students had about Goodnotes or other note taking apps. We had questions ranging from AI Chatbot experiences to understanding how students organize their notes. From this research, we found 3 main findings as follows:

User Research Synthesis

Other things that were to our attention were things like the monthly payment, the inaccuracy of the search bar and having limited notebooks. It seemed to be in accessibility issue, so we also had to consider that when doing the priority/ feature matrix.

With these in mind, we did 6 user interviews with UCD Students who ranged from Computer Science to Human Biology Majors. They were able to expand on their thought process and provide opportunities to shine light on features or new ideas they think would be useful for Goodnotes Experimental LM. Through the User Interviews, we had many suggestions and comments, along with a better understanding of the students' relationship with school and AI.

Competitive Analysis

We explored various document analysis tools from NotebookLM to Notion to understand the industry UX-standards of these types of applications and what features already exist. Our key findings were that our application should have an easy upload process, simple analysis interface, and expand on existing AI-features for our user's specific use cases such as reviewing lecture notes.

Competitive Analysis

Defining User Flows and Features

After conducting our research, we ideated on possible solutions that would be useful for our target audience. We organized these solutions into a prioritization matrix based on effort and impact. From there, we knew we wanted to focus on a context aware AI chatbot, PDF outlining, and document classification tags as our main features.

Feature Matrix

We also mapped out all of our features into a user flow to define how a user might navigate through our product. Our user flow begins on the homepage where a user can upload new documents. The flow then goes through the document analysis page where the user is able to access the AI chatbot and view their PDF outline.

User Flow

Initial Wireframes

Our initial wireframes reflected a flow that connected the backend client needs and our understanding of how these features would be implemented. We communicated and clarified with our client exactly what features he wants in the project, the limitations of possible setbacks from a developer point of view, along with understanding its purpose and why the user would need these specific flows in the first place.

We decided to divide and conquer the different flows and focus on one feature at a time, while also sprinkling ways to incorporate the other features that were mentioned in the matrix. Our main focuses were on the AI Chatbot, segmentation, and the outline generation.

We took inspiration from flows like DocSumo, NotebookLM, Adobe Acrobat, and the current Goodnotes App. It was in this phase where we also experimented with different loading screens, information hierarchy, and formats of metadata and such.

Lofi and Midifi Wireframes

Iterations

Our hi-fidelity prototypes took on multiple iterations as we refined our designs. Throughout our hi-fi iterations we focused on improving the visual hierarchy of our components. We improved the hierarchy of our designs by experimenting with various colors and layouts. Additionally to assist users, we added brief descriptions of each AI model's functions to simplify the process of selecting models.

Design System

We took inspiration from Goodnotes' existing style guides and adapted the logo, typography, colors, and certain components to create a new but recognizable design. We used Goodnote's main blue color to emphasize buttons, and utilized their secondary colors for other standout elements like the document tags. Our design maintained consistency through a structured 8pt grid system and reusable UI components. We also ensured accessibility with proper color contrast and legible typography while integrating smooth transitions and animations for a seamless user experience.

Design System

Landing Page

The landing page allows users to upload new PDFs to be analyzed and to view any analyses they have previously saved. We used a card layout to organize users' saved outlines. Within each card, the user can view the classification of their document. Users can also add their own custom secondary tags to personalize the way they organize their notes

Outline Generation

The Outline Generation sidebar visually showcases headings/subheadings from the uploaded document. That way, the user is able to visualize and understand the main points from the document in a digestible way. The sidebar also has an overall Summary of the document underneath the Outline Tab. The content and information on this feature may change as the user chooses different AI models that fit their learning needs.

Document Chat

The document chat feature allows users to ask questions and receive answers based on the document's content, or choose preset prompts based on the common use cases users identified during our user-interviews such as generating a summary. With segmentation enabled, users can also ask specific section-based questions, allow them to receive more focused answers, making it easier to understand complex content and retain key information.

Multi-Model Comparison

The Multi-Model Comparison feature allows users to analyze and compare insights from multiple AI models side by side. This enables a deeper understanding of document content by evaluating different interpretations, summaries, and responses.

Segmentation Analysis

The Segmentation Analysis tool visually chunks out the content from the document for the user. By segmenting different parts of the document into different sections, the user is able to ask specific questions about that specific section. This would in turn help highlight the learning of the model as it narrows down the scope and details of the document. The user is able to ask more detail oriented questions and also give a clear visual on where the information is being referenced from in the document.

Micro-Interactions

We included several micro interactions in our prototype to provide visual feedback to users. When uploading a new document, a rotating animation plays as their document gets analyzed. Also, when re-analyzing a document, a skeleton screen appears over the left sidebar to indicate that their new analysis is loading. In addition to this, a loading bar in the shape of a Goodnotes inspired loop appears while the new model gets implemented. We also included popups that quickly appear when a user successfully implements a new AI model or saves their analysis.

Segmentation Research and Findings

Aryn AI SDK (DocParse)

Aryn AI provides a tool called DocParse that returns a partitioned PDF, with OCR support included. For typed documents — particularly textbooks or other well-structured typed notes — it performs well, returning bounding boxes that separate out different areas of the page. However, it struggles with handwritten documents, typically treating them as simple images or failing to parse them accurately.

Strengths:

Highly efficient segmentation for typed text.
Straightforward usage through a Python SDK.
Ability to return bounding-box-based segmentation results.

Weaknesses:

Handwritten notes are not parsed effectively.
Free tier limit of 200 pages per month, requiring payment once that limit is exceeded.
Partial or limited functionality when using the API endpoint exclusively (e.g., might only return JSON, not a fully segmented PDF).

PyMuPDF

PyMuPDF was tested to extract information from PDFs and convert the extracted data into a Markdown file. It can also convert the extracted data into a LlamaIndex document and save it, making integration with certain text-based pipelines more convenient.

Performance Notes:

Text extraction is straightforward.
Image extraction is less robust.
Best used in combination with other tools (e.g., PDF.js or Tesseract) if image or complex layout parsing is needed.

Markdown file for a sample textbook page. It's effective at getting the text but we lose spatial data — doesn't help with visual segmenting.

PDF.js for Image Extraction:

PDF.js is a powerful library that can convert multi-page PDFs into images. In testing, it successfully converted a multi-page PDF into a ZIP file of PNG images in roughly 15 seconds.

Key Takeaways:

Efficient and reliable for page-by-page extraction.
Ideal for producing images that can then be fed into an OCR workflow.

Other Approaches and Models

Open-Source Layout Analysis (PDF Document Layout Analysis)

An open-source model on Hugging Face under the repository HURIDOCS/pdf-document-layout-analysis. It intends to emulate a DocParse-like functionality by analyzing and segmenting PDFs.

Pros:

Similar functionality to Aryn AI's DocParse.
Docker-based and open-source, allowing self-hosted solutions.

Cons:

Sparse documentation, making implementation challenging.
Ineffective for handwritten documents.
Potentially time-consuming to integrate due to lack of step-by-step guides.

Deepseek

Another segmentation and classification tool that was difficult to test because the API was frequently down. Its performance and details remain inconclusive, pending more stable access.

Gemini (Classification)

Used for classifying segmented documents (e.g., identifying a PDF as "Artificial Intelligence Notes" or "Academic Research Paper").

Displays a solid ability to handle PDF classification where other models struggle with reading or structuring data.
May or may not benefit from segmentation, as classification alone can sometimes be performed on the raw PDF text.

Tesseract (OCR) + PDF.js + Jimp

Tesseract is widely recognized as a leading open-source OCR (Optical Character Recognition) engine. Used in conjunction with PDF.js to extract PDF pages into images, Tesseract can then detect text within the images and provide bounding box data for segmentation.

Typed Text OCR:

Highly accurate in extracting text from typed, clean PDFs.
Generates bounding boxes at character, word, or line levels.

Handwritten Text OCR:

Performance is heavily dependent on handwriting clarity.
Recognizes some words, but the accuracy drops significantly with poor handwriting.
Still provides bounding boxes, even if the text extraction is partially incorrect.

Visual Overlays and Bounding Boxes:

Libraries such as Jimp can overlay bounding boxes onto images for clarity.
The final annotated images (with bounding boxes drawn) can be recombined into a PDF, providing a "segmented" visual.

Final Observations and Rationale

After exploring multiple solutions, two recurring challenges emerged:

Limited Documentation of Open-Source Models:** Some open-source segmentation projects lacked sufficient guides or instructions, making them difficult to deploy.
Access Barriers:** Certain specialized services provided no direct or timely access keys for testing, halting deeper investigation.

In comparison, Tesseract stood out due to its reliable performance, extensive community, active maintenance, and flexible usage (both in Python via pytesseract and in JavaScript/TypeScript via tesseract.js). While no single solution perfectly handled handwritten notes, Tesseract's open-source nature and ongoing development suggest a more robust path forward.

The final pipeline uses PDF.js (or equivalent) to convert pages to images, processes these images with Tesseract for OCR and bounding boxes, draws bounding boxes via Jimp, and optionally merges them back into a single annotated PDF. This approach resolves the primary requirement: extracting and highlighting textual structures from PDFs while maintaining open-source flexibility.

Implementation Details

Decoding the PDF

When a PDF file is received (base64-encoded), it must be converted to a usable binary form before any processing can occur. The following function strips out the MIME header and converts the data into a Node.js Buffer.

// Function to decode Base64 PDF  
const decodeBase64 = (base64: string): Buffer => {  
    const base64Data = base64.replace(/^data:application\\/pdf;base64,/, '');  
    return Buffer.from(base64Data, 'base64');  
};

1.  **Input:**  A base64-encoded string (often prefixed with  `data:application/pdf;base64,`).
2.  **Output:**  A Node.js  `Buffer`  containing raw PDF data.

## Converting the PDF to Images

To perform OCR, the PDF must be turned into images, typically one image per page. Below,  `pdftoppm`  (from Poppler) is executed to generate PNG files. The image paths are collected and returned for the next step.

const execPromise = promisify(exec);  
const convertPDFToImages = async (pdfBuffer: Buffer): Promise<string[]> => {  
    const tempDir = '/tmp';  
    const pdfPath = path.join(tempDir, `temp.pdf`);  
    const outputPath = path.join(tempDir, `page`);  
    // Write the incoming PDF to a temporary file  
    await fs.promises.writeFile(pdfPath, pdfBuffer);  
    // Convert PDF to PNG images (pdftoppm automatically numbers output pages)  
    const command = `pdftoppm -png ${pdfPath} ${outputPath}`;  
    await execPromise(command);  
    // Gather all generated PNG files named page-<pageNumber>.png  
    const files = fs.readdirSync(tempDir).filter(file => /^page-\\d+\\.png$/.test(file));  
    const imagePaths = files.map(file => path.join(tempDir, file));  
    // Clean up the original PDF once conversion is complete  
    await fs.promises.unlink(pdfPath);  
    return imagePaths;  
};

Write to Temp: The raw PDF is saved to /tmp/temp.pdf.
Convert: pdftoppm -png produces image files (page-1.png, page-2.png, etc.).
Collect Paths: The image file names are returned for use in the OCR phase.
Cleanup: Removes the temporary PDF file to reduce clutter.

OCR with Tesseract

Next, Tesseract is utilized to extract text and bounding boxes from each of the PNG images. Here, a Tesseract worker is created, parameters are set (e.g., page segmentation mode), and each image is recognized.

import { createWorker, PSM } from 'tesseract.js';  
const runOCR = async (imagePaths: string[]) => {  
    // Create and configure a Tesseract worker  
    const worker = await createWorker('eng');  
    await worker.setParameters({  
        tessedit_pageseg_mode: PSM.AUTO_ONLY,  
    });  
    let results = [];  
    for (let i = 0; i < imagePaths.length; i++) {  
        const imagePath = imagePaths[i];  
        const { data } = await worker.recognize(imagePath, undefined, { blocks: true });  
        // Each page's recognized blocks are stored for further reference  
        results[i] = data.blocks;  
        // Additional step for bounding box annotation (see next section)  
        // ...  
    }  
    await worker.terminate();  
    return { ocrResults: results, imagePaths };  
}

Initialization:

createWorker('eng') indicates the English language is used.
Page segmentation mode (PSM.AUTO_ONLY) instructs Tesseract to automatically segment the page.

2. Recognition:

worker.recognize(imagePath, undefined, { blocks: true }) returns recognized text blocks, each with bounding box data.

3. Results:

The recognized blocks for each page are stored in an array.

Drawing Bounding Boxes

Once Tesseract provides the bounding box coordinates, the following function draws green rectangles on the original page image using Jimp. This method highlights the recognized text on the PDF page.

import { Jimp } from 'jimp';  
import { rgbaToInt } from 'jimp';  
  
const drawBoundingBoxes = async (imagePath: string, blocks: any[]) => {  
  
const image = await Jimp.read(imagePath);  
  
blocks.forEach(block => {  
    const { bbox } = block;  
    const { x0, y0, x1, y1 } = bbox;  
  
    const skyBlue = rgbaToInt(187, 203, 221, 255);  
  
    const borderThickness = 4;  
    const borderRadius = 13;  
  
    // Top and bottom edges   
    for (let x = x0; x <= x1; x++) {  
        for (let t = 0; t < borderThickness; t++) {  
            const inLeftCorner = x - x0 < borderRadius;  
            const inRightCorner = x1 - x < borderRadius;  
  
            if (!inLeftCorner && !inRightCorner) {  
                image.setPixelColor(skyBlue, x, y0 + t);  
                image.setPixelColor(skyBlue, x, y1 - t);  
            }  
        }  
    }  
  
    // Left and right edges  
    for (let y = y0; y <= y1; y++) {  
        for (let t = 0; t < borderThickness; t++) {  
            const inTopCorner = y - y0 < borderRadius;  
            const inBottomCorner = y1 - y < borderRadius;  
  
            if (!inTopCorner && !inBottomCorner) {  
                image.setPixelColor(skyBlue, x0 + t, y);  
                image.setPixelColor(skyBlue, x1 - t, y);  
            }  
        }  
    }  
  
    // Corner arcs   
    for (let i = 0; i <= borderRadius; i++) {  
        for (let j = 0; j <= borderRadius; j++) {  
            const dist = i * i + j * j;  
            const outer = borderRadius * borderRadius;  
            const inner = (borderRadius - borderThickness) * (borderRadius - borderThickness);  
  
            if (dist >= inner && dist <= outer) {  
                // Top-left  
                image.setPixelColor(skyBlue, x0 + i, y0 + j);  
                // Top-right  
                image.setPixelColor(skyBlue, x1 - i, y0 + j);  
                // Bottom-left  
                image.setPixelColor(skyBlue, x0 + i, y1 - j);  
                // Bottom-right  
                image.setPixelColor(skyBlue, x1 - i, y1 - j);  
            }  
        }  
    }  
});  
  
  
  
  
    const annotatedPath = `${imagePath.replace('.png', '_annotated')}.png`;  
    await image.write(annotatedPath as `${string}.${string}`);  
  
    return annotatedPath;  
};

Read the Image: Loads the image using Jimp.
Drawing Logic:

Each recognized block has bounding box coordinates: (x0, y0, x1, y1).
Horizontal and vertical lines are drawn in a loop to create a rectangle.
Each image pixel is set to a sky blue color as per the design scheme.

3. Output: Saves a new annotated PNG image with _annotated appended to the file name, returning the path to this new file.

Recreating PDFs from Annotated Images

After bounding box annotation, each annotated PNG image can be turned back into a one-page PDF. This step uses pdf-lib to embed the images into PDF pages.

import { PDFDocument } from 'pdf-lib';  
import fs from 'fs';  
  
const convertImagesToPDFs = async (imagePaths: string[]): Promise<string[]> => {  
    const pdfPaths: string[] = [];  
    for (const imagePath of imagePaths) {  
        const pdfPath = imagePath.replace('.png', '.pdf');  
        const pdfDoc = await PDFDocument.create();  
        const imageBytes = await fs.promises.readFile(imagePath);  
        // Embed the annotated PNG  
        const img = await pdfDoc.embedPng(imageBytes);  
        const page = pdfDoc.addPage([img.width, img.height]);  
        // Draw the embedded image onto the PDF page  
        page.drawImage(img, { x: 0, y: 0, width: img.width, height: img.height });  
        // Save the PDF  
        const pdfBytes = await pdfDoc.save();  
        await fs.promises.writeFile(pdfPath, pdfBytes);  
        pdfPaths.push(pdfPath);  
        // Remove the original (annotated) PNG to clean up  
        await fs.promises.unlink(imagePath);  
    }  
    return pdfPaths;  
}

Create a PDFDocument:

PDFDocument.create() starts a new PDF file in memory.

2. Embed Image:

pdfDoc.embedPng(imageBytes) reads the PNG data and prepares it for placement.

3. Add Page and Draw Image:

The dimension of the page is set to match the image width and height for a 1:1 overlay.

4. Cleanup:

Original PNG files are removed to maintain cleanliness.

Merging PDFs

Since each page is now an individual PDF, they must be merged to form a single multi-page annotated PDF. Below, each single-page PDF is opened, its pages are copied, and appended to a new master PDF.

const mergePDFs = async (pdfPaths: string[]): Promise<string> => {  
    const mergedPdf = await PDFDocument.create();  
  
    for (const pdfPath of pdfPaths) {  
        const pdfBytes = await fs.promises.readFile(pdfPath);  
        const pdfDoc = await PDFDocument.load(pdfBytes);  
  
        // Copy all pages from the single-page PDF  
        const copiedPages = await mergedPdf.copyPages(pdfDoc, pdfDoc.getPageIndices());  
        copiedPages.forEach(page => mergedPdf.addPage(page));  
  
        // Remove the temporary single-page PDF  
        await fs.promises.unlink(pdfPath);  
    }  
  
    // Final merged PDF in memory  
    const mergedPdfBytes = await mergedPdf.save();  
    return Buffer.from(mergedPdfBytes).toString('base64');  
};

Create Master PDF: PDFDocument.create() again starts an empty PDF.
Copy and Append Pages:

mergedPdf.copyPages(pdfDoc, pdfDoc.getPageIndices()) duplicates pages from each single-page PDF and adds them to the master file.

3. Output:

The final merged PDF is returned as a base64-encoded string.

The Main Route

Putting all the above steps together in one endpoint, we have a single POST handler that:

Receives a base64-encoded PDF.
Decodes it.
Converts each page to an image.
Runs OCR, draws bounding boxes, and saves annotated images.
Recreates PDFs from annotated images.
Merges them into a single PDF.
Returns the final PDF (base64-encoded) and OCR results.

Below is the complete route, showcasing how each function is called in sequence:

import { NextResponse } from 'next/server';  
  
export async function POST(req: Request) {  
    const { pdfBase64 } = await req.json();  
    if (!pdfBase64) {  
        return NextResponse.json({ error: 'Missing PDF data' }, { status: 400 });  
    }  
    try {  
        // 1. Decode the incoming base64-encoded PDF  
        const pdfBuffer = decodeBase64(pdfBase64);  
        // 2. Convert PDF pages to images  
        const imagePaths = await convertPDFToImages(pdfBuffer);  
        // 3. Perform OCR and draw bounding boxes  
        const { ocrResults, imagePaths: annotatedImagePaths } = await (async () => {  
            const worker = await createWorker('eng');  
            await worker.setParameters({ tessedit_pageseg_mode: PSM.AUTO_ONLY });  
            let results = [];  
            for (let i = 0; i < imagePaths.length; i++) {  
                const imagePath = imagePaths[i];  
                const { data } = await worker.recognize(imagePath, undefined, { blocks: true });  
                results[i] = data.blocks;  
                // Draw bounding boxes on each image  
                let annotatedImagePath = imagePath;  
                if (data.blocks) {  
                    annotatedImagePath = await drawBoundingBoxes(imagePath, data.blocks);  
                }  
                imagePaths[i] = annotatedImagePath;  
            }  
            await worker.terminate();  
            return { ocrResults: results, imagePaths };  
        })();  
        // 4. Convert the annotated images into PDF files  
        const pdfPaths = await convertImagesToPDFs(annotatedImagePaths);  
        // 5. Merge the single-page PDFs into one  
        const finalPdfBase64 = await mergePDFs(pdfPaths);  
        // Return the merged PDF along with raw OCR results (bounding boxes, text, etc.)  
        return NextResponse.json({ pdfBase64: finalPdfBase64, ocrResults }, { status: 200 });  
    } catch (error) {  
        console.error(error);  
        return NextResponse.json({ error: 'Error processing PDF' }, { status: 500 });  
    }  
}

Validate Input: Checks for pdfBase64 in the request body.
Decode and Convert: Uses the helper functions described above.
OCR & Annotation: A Tesseract worker is created, each page is recognized, and bounding boxes are drawn.
PDF Reconstruction: Annotated images are converted back into single-page PDFs.
Merging PDFs: All PDFs are combined into a single final file, returned in base64.
Error Handling: If any step fails, the route responds with a 500 status and logs the error.

Segmentation Takeaways

After an extensive survey of commercial and open-source solutions:

Commercial Solutions (e.g, Aryn AI):** Provide accurate segmentation for typed text but often require paid subscriptions and perform poorly on handwritten documents.
Open-Source Tools (e.g, Tesseract, PDF Document Layout Analysis):** Offer cost-effectiveness and flexibility but may lack documentation or have implementation hurdles.
Optimal Pipeline: Leveraging Tesseract (for OCR), PDF.js or Poppler utilities (for PDF-to-image conversion), and Jimp (for bounding box annotation) offers a powerful end-to-end workflow. This is especially practical for typed documents and can accommodate handwritten notes to some extent, albeit with lower accuracy.

The final approach detailed above systematically addresses PDF segmentation, OCR, and bounding-box annotation. It takes in a PDF, processes it page by page, and returns a fully annotated PDF. While handwritten recognition remains a universal challenge, Tesseract's active development and extensive support make it a strong long-term solution compared to less-documented or subscription-limited services.

Union & Multi-Model Pipeline

Union — Fusing many model outputs into one

Running several LLMs on the same PDF lets us capture complementary strengths:

The Union layer merges these outputs so we retain every unique insight while stripping near-duplicates.

Merge algorithm

Collect responses: Front-end calls /api/generate-quiz for each chosen model → array of documentAnalysisSchema objects

Deduplicate tags

areTagsSimilar = (a, b) => levenshteinDistance(a, b) <= 2;

Deduplicate outline bullets

areOutlinePointsSimilar = (a, b) => levenshteinDistance(a, b) <= 3;

Build a "super-summary"

split every summary into sentences
drop sentences < 20 chars
keep the first 5 sentences that differ by > 10 edits from any sentence already kept.Return one canonical object

return {  
  ...responses[0],                // template  
  summary: unionSummary,  
  layout_evidence: {  
    ...responses[0].layout_evidence,  
    outline_points: unionOutlinePoints,  
  },  
  content_tags: unionTags,  
};

Levenshtein distance — why & how we use it

Example

SUNDAY → SATURDAY // 3 substitutions + 1 insertion = distance 4

Field Typical length Threshold Goal Tags 5-20 chars 2 Catch typos/plurals ("back-prop") Outline bullets < 50 chars 3 Merge minor wording changes Summary sentences 50-120 chars 10 Allow re-phrasing, block new ideas

Complexity O(|a| × |b|) per comparison — trivial for these short strings.

If semantic duplicates ("car" vs "automobile") become common, we'll switch to SBERT embeddings.

Future improvements

SBERT-based semantic similarity in place of edit distance.
Conflict table when models disagree on document_type.

Multiple-Model Flow

Model picker Users tick any mix of GPT-4, GPT-3.5, Claude-3 (Opus/Sonnet), Gemini-2 Flash.
Re-Analyze — All selected models run in parallel:

const responses = await Promise.all(  
  selectedModels.map(model =>  
    fetch("/api/generate-quiz", {  
      method: "POST",  
      headers: { "Content-Type": "application/json" },  
      body: JSON.stringify({ files: encodedFiles, modelName: model })  
    }).then(r => r.json())  
  )  
);  
setAnalysis(selectUnion(responses));

Union merge — Canonical analysis drives the sidebar + chat.
Loading overlay disappears once every model resolves.
Future Work

Retry / back-off on 429 or 503 rate limits.
Telemetry for latency, token usage, or cost.

Add these later under a Future Work milestone.

Document Storage

Server-Side Encryption: Initial Data Protection

Securely storing sensitive documents in the cloud requires robust encryption. Server-side encryption was chosen over client-side to mitigate potential security risks and complexities. Before diving into the detailed implementation of document storage, let's take a look at a diagram that summarizes the process:

Encryption keys are generated and stored securely using HttpOnly cookies, rendering them inaccessible to client-side JavaScript. Upon document upload, the system verifies the existence of an encryption key. If absent, a new 256-bit key is generated using Node's crypto library and stored in an HttpOnly cookie.

if (!encryptionKey) {  
    encryptionKey = crypto.randomBytes(32).toString("hex");  
    cookies().set("client_document_encryption_key", encryptionKey, { httpOnly: true, secure: true });  
}

Encryption occurs within a Next.js server action (storeDocumentFile), ensuring a controlled server-side environment. The uploaded file is retrieved from form data, and file size limits are enforced via client-side validation and Supabase's inherent limitations (50MB).

const file = formData.get("file") as Blob;  
if (!file) {  
    throw new Error("No file uploaded.");  
}

Utilizing AES-256-GCM, the file is encrypted with a randomly generated Initialization Vector (IV). The IV, authentication tag, and ciphertext are concatenated for secure storage. Error handling is implemented to manage potential upload failures to Supabase. Encrypted data is stored in a Supabase storage bucket named "documents," with filenames generated as random UUIDs to prevent unauthorized access to unencrypted data. Supabase's dashboard manages access policies, and its built-in handling supports large files.

const { data, error } = await client.storage.from("documents")																				.upload(randomUUID(), encryptedFile);

Metadata, including FID, title, size, type, and timestamp, is stored locally using Dexie.js for rapid access and offline functionality.

await addDocument({  
    fid,  
    title: file.name.substring(0, file.name.length - 4),  
    type: file.type,  
    size: file.size,  
    timestamp: new Date(),  
    analysis: defaultAnalysis,  
});

Document Retrieval and Decryption: Accessing Protected Data

Encrypted files are retrieved from Supabase using the FID. Decryption employs the encryption key from the HttpOnly cookie. The IV and authentication tag are extracted from the encrypted blob, and the ciphertext is decrypted using AES-256-GCM. Error handling is implemented via a try/catch block.

const { iv, authTag, ciphertext } = extractComponents(encryptedBuffer);  
const decryptedBuffer = Buffer.concat([decipher.update(ciphertext), decipher.final()]);  
return new Blob([decryptedBuffer]);

**Note: Authorization for decryption relies on the secure HttpOnly cookie.**

## Local-First Power with Dexie.js:

To provide a responsive user experience with offline access. We used Dexie.js is used to store metadata locally, enabling fast data retrieval. React Context facilitates access to Dexie methods. The Dexie database utilizes versioning, with version 2 introducing an  `analysis.status`  field. An upgrade callback handles migration of existing records.

All Dexie queries are wrapped with withTiming to log performance metrics. We were able to capture speeds as low as 0.37ms for document metadata retrieval and 10ms for adding new document metadata to the database. Of course, this follows a database initialization time of on average 70ms on initial page load.

![](https://miro.medium.com/v2/resize:fit:1400/1*RAvSP51IbYm92I2fzNe_dw.png)

Database Schema diagram

# All Analysis View (Homepage)
<br></br>

The "All Analysis" view, or homepage, serves as the central hub for users to interact with their documents. We aimed to create a seamless and intuitive experience, focusing on efficient search, flexible sorting, and easy access to document metadata.

## Search and Sorting: Prioritizing User Efficiency

As the number of documents grows, finding specific files becomes increasingly difficult. We needed to implement robust search and sorting features to streamline this process. We chose  **Fuse.js**  for its  **powerful fuzzy search capabilities**  and Dexie.js for its efficient local database management. This combination allowed us to provide a responsive and user-friendly interface.

## Fuse.js Implementation

We recognized that users often make typos or remember only parts of a document's title or tags. Fuse.js's fuzzy search algorithm was ideal for handling these scenarios.

![](https://miro.medium.com/v2/resize:fit:1400/1*Uehgv0emNx9FLyhvHgV96g.gif)

-   **Initialization**: We instantiated Fuse.js with the documents array, specifying the keys (title, analysis.classification, analysis.contentTags) to be searched. The threshold parameter was carefully tuned to balance accuracy and tolerance, ensuring that relevant results were not missed.

```jsx  
const fuse = new Fuse(documents || [], {  
    keys: ["title", "analysis.classification", "analysis.contentTags"],  
    threshold: 0.3,  
    includeMatches: true,  
});

Detailed Matches: The includeMatches option was crucial, as it provided detailed information about the matched substrings. This allowed us to highlight the search terms within the DocumentPreviewCard components, improving result clarity.
Search Functionality: The searchDocuments function dynamically filters the documents based on the user's query. When a query is entered, Fuse.js performs a fuzzy search, and the results are mapped to include both the document and the match information. This ensures that the UI reflects the search results in real-time.

const searchDocuments = (query: string) => {  
    if (query) {  
        const searchResults = fuse.search(query);  
        setResults(  
            searchResults.map((result) => ({  
                item: result.item,  
                matches: result.matches ? [...result.matches] : undefined,  
            }))  
        );  
    } else {  
        setResults([]);  
    }  
};

Sorting Implementation:

We wanted to give users control over how their documents are displayed. The Tabs component, combined with the sortDocuments function, allowed us to implement flexible sorting options.

Sorting Logic: The sortDocuments function takes an array of documents and sorts them based on the sortBy state. We implemented sorting by date, name, and type, each with its own comparison logic.

const sortDocuments = (docs: Document[]) => {  
    const sorted = [...docs];  
    switch (sortBy) {  
        case "date":  
            return sorted.sort((a, b) => b.timestamp.getTime() - a.timestamp.getTime());  
        case "name":  
            return sorted.sort((a, b) => a.title.localeCompare(b.title));  
        case "type":  
            return sorted.sort((a, b) => a.analysis.classification.localeCompare(b.analysis.classification));  
        default:  
            return sorted;  
    }  
};

Dynamic Display: The documentsToDisplay variable dynamically selects the documents to be rendered. If there are search results, it sorts them; otherwise, it sorts all documents. This ensures that the sorting logic is applied consistently, regardless of whether the user is searching.

const documentsToDisplay = results.length > 0 ?   
			sortDocuments(results.map((result) => ({ ...result.item,   
																							matches: result.matches })))   
			:   
			sortDocuments(Array.from(documents?.filter((doc) =>   
							doc.analysis.status === DocumentAnalysisStatus.COMPLETE) || []));

User Interaction: The Tabs component provides an intuitive way for users to switch between sorting options. When a user selects a tab, the sortBy state is updated, triggering a re-render of the document list.

Dexie.js Integration:

Dexie.js plays a pivotal role in managing the document data locally.

-   **Live Queries:**  The  `useLiveQuery`  hook from  `dexie-react-hooks`  allows us to efficiently fetch and react to changes in the Dexie.js database. This ensures that the UI is always up-to-date with the latest document data, providing a seamless user experience.

const documents = useLiveQuery(  
    () => db.documents.where("analysis.status")  
				    .equals(DocumentAnalysisStatus.COMPLETE)  
				    .toArray(),  
    []  
);

Data Filtering: We filter the documents to only display those with DocumentAnalysisStatus.COMPLETE. This ensures that the homepage only shows documents that have been fully processed.

Document Preview Cards and Tag Management

We wanted to create a visually appealing and informative document overview, with easy access to metadata and tag management.

Our Approach:

Document Preview Cards: The DocumentPreviewCard components display document thumbnails and metadata. The thumbnail provides a visual representation of the document, while the metadata gives users a quick overview of the document's content.

<DocumentPreviewCard key={doc.id} document={doc} onClick={(e)=> { console.log(doc); }} />

Document Information: The DocumentInfo component displays the document type, title, date, and tags. The Highlight component is used to highlight search terms within the document preview card, improving result clarity.

<Tag key={tag}> {getHighlightedText(tag.toUpperCase(), "analysis.contentTags")} </Tag>

Tag Management: The Popover component provides a convenient way for users to add and remove tags directly within the cards. The Tag component allows for the deletion of tags, while the AddTagPopover component allows for the addition of new tags.

<AddTagPopover docId={id} analysis={analysis} tags={tags} />

File Upload Flow and Loading

File uploads can take time, and we wanted to provide clear feedback to the user during this process so we created an imperative. The ImperativeLoading component allows us to control the loading steps programmatically. This gives us fine-grained control over the loading feedback, ensuring that the user is always informed of the upload progress. More about this later!

<ImperativeLoading ref={loadingRef} steps={[ /* ... */ ]} />

File Dropzone

The UploadFiles component uses a FileDropzone to handle file uploads. This provides a drag-and-drop interface, making it easy for users to upload files. The file dropzone will not work or provide the confirm dialog until users enter a valid file.

Confirmation Dialog

The ConfirmDialog component allows the user to review the file details before submitting it for processing. This helps prevent accidental uploads and ensures that the user is aware of the file's metadata.

<ConfirmDialog   
		file={file}   
		setFile={setFile}   
		onSubmit={()=> document.getElementById("file-form")?  
														.dispatchEvent(new Event("submit",													
                                                        { cancelable: true, bubbles: true }))  
		}/>

Loading Feedback and User Experience:

We wanted to create a loading experience that was both informative and engaging. The ImperativeLoading component allows us to provide step-by-step feedback during long operations. This ensures that the user is always aware of the progress and prevents frustration.

/* In our onSubmit */  
await storeDocumentFile(formData)  
            .then((response) => {  
                loadingRef.current?.next();  
                return response.fid;  
            })  
            .then(async (fid) => {  
                return await addDocument({  
                    fid,  
                    title: file.name.substring(0, file.name.length - 4),  
                    type: file.type,  
                    size: file.size,  
                    timestamp: new Date(),  
                    analysis: defaultAnalysis, // empty analysis  
                });  
            })  
            .then(() => {  
                loadingRef.current?.next();  
            })  
            .catch((error) => {  
                console.error(error);  
            });  
...  
/* In the tsx */  
<ImperativeLoading  
    ref={loadingRef}  
    steps={[  
        {  
            header: "This may take a few seconds",  
            message: "Uploading Document",  
        },  
        {  
            header: "This may take a few seconds",  
            message: "Processing Document",  
        },  
        {  
            header: "Upload complete",  
            message: "Your document successfully uploaded",  
        },  
    ]}  
/>

Visual Cues: We used animations and visual cues, such as the AnimatedLoadingBook and Scribble image, to make the loading process more engaging. These elements help to keep the user's attention and prevent them from becoming bored or impatient.
Smooth Transitions: The messages are animated to fade in and out, creating a smooth transition between steps. This helps to create a polished and professional user experience.
Animation Management: The Loading component uses useEffect and useRef to manage the animation states and message transitions. This ensures that the animations are performed efficiently and that the UI remains responsive.

By focusing on user efficiency, intuitive design, and clear feedback, we created a homepage that empowers users to manage their documents effectively.

Multi-Model Comparison

The side-by-side model comparison feature provides users with a powerful way to evaluate multiple AI models simultaneously. This implementation focuses on creating a seamless experience for comparing analysis results and chatting with multiple models.

We implemented the comparison interface as a modal dialog to provide a distraction-free environment while maintaining context with the underlying document. This approach allows users to focus exclusively on model comparison without navigating away from their current context.

<AlertDialogContent className="bg-background rounded-xl min-w-1 w-[calc(100%-10rem)] min-h-3/4 p-0 max-w-[1200px_!important]">  
    <AlertDialogTitle hidden>Compare All Models</AlertDialogTitle>  
    <CompareModels models={models} />  
    <AlertDialogCancel asChild>  
        <Button  
            variant="ghost"  
            className="text-white border-none bg-transparent p-0 hover:bg-transparent absolute -top-10 -right-14 cursor-pointer"  
        >  
            <X className="min-w-8 min-h-8" />  
        </Button>  
    </AlertDialogCancel>  
</AlertDialogContent>

The modal uses custom sizing with w-[calc(100%-10rem)] to create an optimal viewing area while maintaining margins, and min-h-3/4 to ensure sufficient vertical space without overwhelming the screen.

Tab-Based View Switching

To accommodate different comparison needs, we implemented a tab-based interface that toggles between analysis and chat modes. This provides a clean separation of concerns while maintaining a consistent UI.

<Tabs className="w-full h-full flex flex-col" defaultValue="analysis">  
    <TabsList className="mx-auto">  
        <TabsTrigger className="px-5" value="analysis" defaultChecked>  
            <Article className="size-5 mr-1.h5" />  
            Analysis  
        </TabsTrigger>  
        <TabsTrigger className="px-8" value="chat">  
            <ChatCircleDots className="size-5 mr-1.5" />  
            Chat  
        </TabsTrigger>  
    </TabsList>

 <TabsContent className="w-full h-full grid grid-cols-3 gap-8 pt-4" value="analysis">  
        {/* Analysis content */}  
    </TabsContent> <TabsContent className="w-full h-full flex flex-col gap-8 pt-4" value="chat">  
        {/* Chat content */}  
    </TabsContent>  
</Tabs>

Each tab features an icon (Article or ChatCircleDots) to enhance visual recognition. The "Analysis" tab shows document insights from all models, while the "Chat" tab enables simultaneous conversations with multiple models.

Multi-Model Grid Layout

For effective side-by-side comparison, we implemented a responsive grid layout that adjusts based on the number of models being compared.

<TabsContent className="w-full h-full grid grid-cols-3 gap-8 pt-4" value="analysis">  
    {models.map((model) => {  
        return (  
            <div className="flex flex-col gap-3" key={model.versionedId}>  
                <div className="h-5 relative flex flex-row items-center justify-start">  
                    <Image  
                        src={model.provider.image}  
                        alt="model"  
                        width={model?.provider.image.width}  
                        height={model?.provider.image.height}  
                        className="h-full w-fit"  
                    />  
                </div>  
                {isLoading ? (  
                    <DocumentAnalysisSkeleton className="h-full" />  
                ) : (  
                    <DocumentAnalysisSection  
                        className="h-full"  
                        analysis={/* Analysis data */}  
                        documentType="Lecture Note"  
                    />  
                )}  
            </div>  
        );  
    })}  
</TabsContent>

Each model's analysis is presented in a consistent format with the provider's logo at the top for clear visual differentiation. During loading, we display skeleton UI components to maintain layout consistency and provide feedback.

Synchronized Chat

A key feature of our implementation is the ability to ask a single question to all models simultaneously, enabling direct comparison of responses. This was achieved through a centralized message store.

// In the parent component  
<LargeChatInput  
    className="w-[60%]"  
    onSubmit={(value)=> {  
        setUserMessages([  
            ...userMessages,  
            {  
                content: value,  
                appended: false,  
            },  
        ]);  
    }}  
/>

// In the ModelChatBox component  
useEffect(() => {  
    if (userMessages.length > 0) {  
        const lastUserMessage = userMessages[userMessages.length - 1];  
        if (!lastUserMessage.appended) {  
            setUserMessages([  
                ...userMessages,  
                { ...lastUserMessage, appended: true },  
            ]);  
            append({  
                role: "user",  
                content: lastUserMessage.content,  
            });  
        }  
    }  
    // eslint-disable-next-line  
}, [userMessages, setUserMessages])

When a user submits a question, it's stored in a central state with an appended flag set to false. Each model's chat component monitors this state and processes new messages, ensuring all models receive the same question simultaneously.

Individual Chat Interfaces

Each model has its own chat interface that maintains conversation history and displays responses. We use the useChat hook to manage these individual chats:

const { messages, isLoading, append } = useChat({  
    api: `/api/chat/${model.versionedId}`,  
    body: {  
        fileContent: analysis.content,  
        contentTags: analysis.contentTags,  
        documentType: analysis.classification,  
    },  
});

The chat interface renders user and system messages with appropriate styling:

<ChatBox model={{ img: model.provider.image }} className="h-full">  
    {(!messages || messages.length === 0) && <StartChatting />}

 {messages && messages.map((message) => {  
        if (message.role === "user") {  
            return (  
                <UserMessage key={message.id} className="self-end">  
                    {message.content}  
                </UserMessage>  
            );  
        } else if (message.role === "assistant") {  
            return (  
                <SystemMessage key={message.id}>  
                    {message.content}  
                </SystemMessage>  
            );  
        } else {  
            return null;  
        }  
    })} {isLoading && <LoadingGeneration className="w-full" />}  
</ChatBox>

When no conversation exists, we display a prompt encouraging users to begin chatting. During response generation, a loading indicator provides visual feedback.

Loading States

To enhance user experience during data retrieval, we implemented loading states with skeleton UI components that maintain layout consistency while data is being fetched.

{isLoading ? (  
    <DocumentAnalysisSkeleton className="h-full" />  
) : (  
    <DocumentAnalysisSection  
        className="h-full"  
        analysis={/* Analysis data */}  
        documentType="Lecture Note"  
    />  
)}

For chat responses, we use a dedicated loading component that provides visual feedback during response generation:

{isLoading && <LoadingGeneration className="w-full" />}

This approach maintains UI consistency throughout the loading process, preventing jarring layout shifts that could disrupt the user experience.

Responsive Design

The comparison interface is designed to be responsive, adapting to different screen sizes while maintaining usability. Key responsive features include:

Grid Adaptation: Using grid-cols-3 for desktop with appropriate stacking on smaller screens
Centered Chat Input: The w-[60%] width for the chat input ensures it's appropriately sized on all devices
Flexible Modal Sizing: The modal's w-[calc(100%-10rem)] approach ensures proper margins regardless of screen size

Challenges

Determining the visual brand

One challenge in this project was balancing creating an original design while maintaining the visual branding of Goodnotes. As designers, we found ways to incorporate new micro interactions, colors, and layouts while still recognizing the original branding of Goodnotes.

Connecting User Research with Client Needs

It was hard to find a balance of new findings from our research, while also connecting and creating a user flow that our client was satisfied with. Since this project was mostly AI/ML and coding heavy, the purpose of a design process was briefly ambiguous in the beginning, but we made sure that there was a way to connect our cross-functional team together to create something extraordinary.

Takeaways

Collaboration Leads to Learning

Working closely in a cross-functional team taught us how to be mindful of technical and design constraints and feasibility while creating a product.

The Devil's in the Details

There were multiple times when we thought a component was good enough to be set in our design system, but as we kept fine tuning the details and came up with multiple different iterations, we felt more confident in creating fleshed out and intentional components that were well thought out and purposeful.

Final Reflection

Working on this project with Goodnotes has been such a fulfilling and collaborative experience. Over the course of just six weeks, our team poured in thought, care, and creativity to build not just a prototype, but a vision for how AI could enhance the note-taking experience for real students like us.

From the early brainstorming sessions and user interviews, to diving into complex segmentation pipelines and finessing our UI animations, every step of this journey has been a true team effort.

What we're most proud of isn't just the tech we built or the polish of the UI: it's the fact that this tool was shaped around real needs.

We also learned a lot about design systems, OCR, how to collaborate with APIs, and how to build a product with empathy for users. The Goodnotes team gave us the freedom to explore, the guidance to stay grounded, and the encouragement to make this our own.

Most of all, we're proud of what we created together. This project felt like more than just a client engagement and it felt like building a tiny part of the future of learning.

Thanks for the journey, Goodnotes.

Previous Post← Reddit MLB Next PostBranding > Networking →

Creation of GoodNotes LM

AI Insights

Contents

Share

AI Insights

In this story

Share

Our Client

What We Did

Meet the Team

Timeframe

Tools

The Product

Who are our users?

User Research Synthesis

Competitive Analysis

Defining User Flows and Features

Initial Wireframes

Iterations

Design System

Landing Page

Outline Generation

Document Chat

Multi-Model Comparison

Segmentation Analysis

Micro-Interactions

Segmentation Research and Findings

Aryn AI SDK (DocParse)

Strengths:

Weaknesses:

PyMuPDF

Performance Notes:

PDF.js for Image Extraction:

Key Takeaways:

Other Approaches and Models

Open-Source Layout Analysis (PDF Document Layout Analysis)

Pros:

Cons:

Deepseek

Gemini (Classification)

Tesseract (OCR) + PDF.js + Jimp

Typed Text OCR:

Handwritten Text OCR:

Visual Overlays and Bounding Boxes:

Final Observations and Rationale

Implementation Details

Decoding the PDF

OCR with Tesseract

Drawing Bounding Boxes

Recreating PDFs from Annotated Images

Merging PDFs

The Main Route

Segmentation Takeaways

Union & Multi-Model Pipeline

Union — Fusing many model outputs into one

Merge algorithm

Build a "super-summary"

Levenshtein distance — why & how we use it

Multiple-Model Flow

Document Storage

Server-Side Encryption: Initial Data Protection

Document Retrieval and Decryption: Accessing Protected Data

Sorting Implementation:

Dexie.js Integration:

Document Preview Cards and Tag Management

File Upload Flow and Loading

File Dropzone

Confirmation Dialog

Loading Feedback and User Experience:

Multi-Model Comparison

Modal Architecture

Tab-Based View Switching

Multi-Model Grid Layout

Synchronized Chat

Individual Chat Interfaces

Loading States

Responsive Design

Challenges

Determining the visual brand

Connecting User Research with Client Needs