LlamaParseReader with TypeScript and Express.js
Parsing documents like PDFs can be a real pain. Thankfully, LlamaIndex has you covered. LlamaParseReader
leverages LlamaCloud creating a simple way to parse documents and prepare them for use in your RAG applications.
I'll assume if you are reading this article, you have worked with LlamaIndex, and this guide shows the document parsing LlamaCloud offers.
If not, there is a phenomenal free video course on DeepLearning.ai which you can watch here.
Before we start, ensure you have an API Key from LlamaIndex Cloud, which we will need later (and named LLAMA_CLOUD_API_KEY
in your .env
file). We also use OpenAI as our LLM. You can get a key for that here.
This is an Express.js example, but the core logic could be lifted for any Node.js environment.
Install Dependencies
First, install the necessary packages:
npm install express llamaindex dotenv npm install --save-dev @types/express typescript
Setup .env File
Create a .env
file in the root of your project and add your API keys:
LLAMA_CLOUD_API_KEY=your_api_key_here OPENAI_API_KEY=your_api_key_here
Write Some Code
Create a new TypeScript file, for example, server.ts
. Below is the complete code to use LlamaParseReader
with Express.js:
import express, { Request, Response } from "express"; import { config } from "dotenv"; import { VectorStoreIndex, OpenAI, Settings } from "llamaindex"; import { LlamaParseReader } from "llamaindex/readers/LlamaParseReader"; // Load environment variables from .env file config(); // Set up LLM settings, I tend to use 3.5 because it's cheap and works well with all the use cases I've thrown at it with local docs. Settings.llm = new OpenAI({ model: "gpt-3.5-turbo" }); const app = express(); const port = process.env.PORT || 3000; app.use(express.json()); app.post("/query", async (req: Request, res: Response) => { try { const { query } = req.body; if (!query) { throw new Error("Input is required"); } // Initialize the LlamaParseReader const reader = new LlamaParseReader({ resultType: "markdown" }); // Load and parse the document const documents = await reader.loadData( "./src/data/writing-effectively.pdf" ); // Create embeddings and store them in a VectorStoreIndex const index = await VectorStoreIndex.fromDocuments(documents); // Create a query engine const queryEngine = index.asQueryEngine(); // Query the document using the query engine const { response } = await queryEngine.query({ query }); // Return the response res.json({ response }); } catch (err) { console.error(err); res.status(400).send("Something went wrong."); } }); app.listen(port, () => { console.log(`Server is running on port ${port}`); });
Compile and Run the Code
Ensure you have the TypeScript compiler installed and run the code:
npx tsc && node dist/server.js
Make sure your TypeScript compiler is configured correctly in tsconfig.json
.