Processing Structured & Unstructured Data with SuperAGI and LlamaIndex

Processing Structured & Unstructured Data with SuperAGI and LlamaIndex

SuperAGI's latest integration with LlamaIndex can extend the overall agent’s capability of understanding and working with a wide range of data types and sources.

With LlamaIndex, AI agents in SuperAGI can now ingest data from:

Unstructured Data sources such as,

  • Documents and Raw Text Files: Like word processing documents or simple text notes (.docx, .txt)

  • PDFs: Digital documents

  • Videos and Images: Visual media formats (.jpg, .png, .mp4 etc)

as well as Structured Data sources like

  • Excel and CSV: Tabulated data where information is presented in rows and columns.

  • SQL: A database format where data is stored in tables with rows and columns.

or even, Semi-structured Data sources such as Slack & Notion.

SuperAGI with LlamaIndex

Data Processing in SuperAGI

There are various steps involved in fetching, processing, and sending the data to the Vector Database through LlamaIndex in the form of vector node objects.

🔁 Resource Management & Data Conversion

Files and documents are uploaded to the SuperAGI Resource Manager where these files are parsed through LlamaIndex and converted into vector node objects, which are subsequently stored in a VectorDB like Redis, Chroma, Pinecone, or Qdrant.

Resource Management and Data Conversion

📝 Conversion to Vector Node Objects

SuperAGI resource Manager stores the data in the form of vectorized node objects allowing fast and easy accessibility. Alongside vectorized node objects, SuperAGI also stores summaries of each file. A master summary of all files within the Resource Manager is created, which can be utilized by the Agent based on the Agent's goal and instructions.

🆎 Metadata Filtering and Database Support

Metadata filtering is primarily used to filter specific resources required for an agent run. Each agent run is associated with a unique identifier, or 'agent id', which is used as a key to filter the resources. This means that the system can identify and select only those resources that are relevant to a particular agent run, improving the accuracy of the data retrieval process. The integration supports databases that inherently support metadata filtering (Redis, Chroma, Pinecone, or Qdrant).

🔍 Running Query using QueryResourceTool

Once an agent run is initiated, SuperAGI agents can query these node objects using the 'QueryResource Tool'. It allows agents to work with a large set of data resources and provides the agent with the required information throughout any iteration to accomplish its goals.

✅ Use Cases

  1. Financial Report Analysis: Users can interact with financial data, extract analysis from it by uploading a CSV, and request the agent to generate an analysis report.

  2. Book Chapter Summarization: By uploading an EPUB/PDF file of an entire book and instructing the agent to summarize a chapter, a summary of that chapter will be generated and stored in the output folder.

LlamaIndex Demo

Sources: https://twitter.com/ishaanbhola/status/1675826595985231872 https://twitter.com/geeky_baller/status/1676916836368257024