Decoder: An AI Script to Summarise an Application Codebase
This is an overview of a script / application I wrote called “Decoder” (Github here). It parses all the files in an application folder (ignoring the big ones, data ones, node_modules and the like) to figure out what an application does, how the files inter-relate, and so on.
The purpose of Decoder is to create context that’s understandable not just by humans but by AI.
With AI, you can know just a little bit about web page coding — like, how to run a Python script or host a single-app webpage — and suddenly you’re like “Wow, I made a whole site!”
This is pretty awesome. But coding with AI quickly becomes very difficult when your application gets even mildly big. You have dozens of files, an elaborate application structure, lots of dependencies, configuration files, and so on… it gets cumbersome.
It gets harder and harder to pass context to AIs and for AI to remember concepts. It keeps forgetting past bits of code or how you were doing things. It also forgets decisions you previously made.
On top of that, sometimes you inherit a codebase. For example, I inherited a React Native application from my developer a few months ago. I’ve been trying to wrap my head around it and the thing is just too bloody big. I’m only barely competent with React; React Native is hard.
So, anyway, to “get to the bloody point already, Dana,” I wrote a script (obviously with the help of AI, but I think it’s redundant to point that out these days) that recursively goes through a folder structure and sends a query to the Anthropic API (because Anthropic doesn’t suffer from the milquetoast treatment OpenAI seems to have) to give me a project summary.
Updates
Made some updates in late July to better handle directories and also to include .mjs
files, as I’ve been busy creating Lambda modules.
The following description of “Decoder” was generated by the first version of the “Decoder” script.
What is Decoder?
Decoder is a TypeScript/Node.js application that generates comprehensive overviews of software projects. It takes a project directory as input and produces two key outputs:
- A text file containing the raw content of the analyzed project files.
- A Markdown file with a detailed description of the project, including its purpose, architecture, file structure, and potential areas for improvement.
Application Architecture
- Language: TypeScript (compiled to JavaScript)
- Runtime: Node.js
- Key Dependencies:
- @anthropic-ai/sdk: For AI-powered project analysis
- commander: For parsing command-line arguments
- dotenv: For loading environment variables
Folder and File Structure
Copydecoder/
├── decoder.ts # Main application logic
├── decoder.js # Compiled JavaScript version of decoder.ts
├── package.json # Project metadata and dependencies
└── tsconfig.json # TypeScript configuration
Core Functionality
The heart of Decoder lies in its decoder.ts
file, which contains several key functions:
shouldIncludeFile
: Determines if a file should be included in the analysisanalyzeDirectory
: Recursively analyzes a directory and its contentsgenerateDescription
: Uses the Anthropic AI to generate a project descriptionmain
: Orchestrates the overall application flow
How to Use Decoder
- Setup:
- Ensure Node.js is installed
- Clone the repository
- Run
npm install
to install dependencies - Create a .env file with the ANTHROPIC_API_KEY
- Running the application:
- Build the project:
npm run build
- Run the decoder:
./dist/decoder.js [path-to-project-directory]
- Build the project:
- Output:
- A content file (content-[foldername]-[timestamp].txt) will be generated in the current directory
- A project_description.md file will be created in the analyzed project directory
Areas for Improvement
While Decoder is a powerful tool, there’s always room for enhancement:
- Error Handling: Implement more robust error handling for file system operations and API calls.
- Configuration: Make file inclusion/exclusion rules configurable.
- Scalability: Implement streaming or chunking for processing large projects.
- Security: Ensure proper handling of sensitive information.
- Testing: Add unit tests to improve reliability.
- Documentation: Enhance inline comments and generate API documentation.
- AI Model Dependency: Consider implementing fallback options or local processing capabilities.
- Rate Limiting: Implement rate limiting for API calls.
- Progress Indication: Add a progress bar or status updates for large projects.
- Output Formatting: Enhance project description output with better formatting and syntax highlighting.
What’s next?
(This part is written by me!)
Decoder currently creates human-type descriptions. This works, and it even works for AI. But it might be more logical to keep it as a structured document, in JSON or some other organised data structure.
In the future, it’ll be used as the basis for application development, debugging, and extension, so I’ll have to be much more granular in the way it’s written. I might include these as options for the script, but we’ll see.
Thanks for writing this up and creating the code. I’m right in the middle of figuring out a complex codebase consisting of Astro+Ghost. Looking forward to exploring your writing about AI