Carbon Changelog

Monthly updates with the latest features added, improvements made and bugs squashed.

April 2024

Read more ->

New Webhook Events

We’ve introduced 2 additional webhook events to help track file sync statuses:
- FILE_CREATED: This event is fired when a user queues up a file to be synced for the first time. The body of the webhook will contain a list of file_ids for files that were created in the same upload, and multiple events could fire for the same upload if a lot of files were queued.
- ALL_UPLOADED_FILES_QUEUED: This event is fired when every single item in an upload has been queued for sync, including all children of folders in an upload. The body will contain the upload’s request_id.

Couple notes:
- Both file_ids and request_ids can be used to filter for the files in /user_files_v2.
- A request_id is now always generated for an upload to support the ALL_UPLOADED_FILES_QUEUED webhook. Previously, it was only generated by the user (unless you’re using Carbon Connect) and passed to us as a parameter. You may still do that and we’ll use your generated request_id, but if they don’t then we’ll generate an request_id for you on behalf of the user’s upload.
- These two webhooks currently are supported for 3rd party data sources only. Support for web scrapes and local file uploads will be coming soon.
You can find more details here.

GitHub Connector

We launched our Github integration today that syncs pages from both public and public repositories.
The Carbon Connect enabledIntegration slug for Github is GITHUB. You’ll need to update to 2.0.0-beta19 to access the new screen.
Users should first submit their GitHub username and access token to our integration endpoint at /integrations/github. Then you can then use our global endpoints for listing and syncing specific files in different repositories:
- List files from repositories with the global endpoints /integrations/items/list
- Sync files from repositories with the global endpoint /integrations/files/sync
See more specifics about our Github integration here.

Set Max Files Per Upload

A new user-level parameter, max_files_per_upload, has been introduced that can be modified via the /update_users endpoint. It determines the maximum number of files a user can upload in a single request.
- Files that exceed the maximum number of files will be moved into the SYNC_ERROR status with webhooks being fired to alert you.
You can check the file_single_upload_limit set for a particular user via the user endpoint.
Find more details here.
Important Update: The parameter max_files now serves to establish the overall file upload limit for a user across all uploads.

Add `include_all_children` to Embeddings Endpoint

Added param include_all_children to the embeddings endpoint. When this param is set to true, the search is run over all filtered files as well as their children.
Filters applied to the endpoint extend to the returned child files.

In-House File Picker for Confluence and Salesforce

We’re excited to introduce our in-house file picker, starting with Confluence and Salesforce. Our in-house file picker is still in beta, but you can test it out by manually running npm install carbon-connect@2.0.0-beta13
With this update, end users gain the ability to directly select and upload specific files from Confluence and Salesforce. Previously, this functionality was unavailable as neither platform offered their own dedicated file pickers.
When syncFilesOnConnection is set to false then our file picker will be enabled.
Here’s a quick walkthrough I recorded.

Hiding 3rd-Party File Picker

The endpoints /integrations/oauth_url and /integrations/connect now support a new boolean parameter named enable_file_picker.
- When enable_file_picker is set to true (default behavior), a button will be displayed on the success page. Clicking this button will open the file picker associated with the respective source. This is the standard behavior.
- Conversely, setting enable_file_picker to false will hide the file picker button on the success page. In such cases, end users will be directed to use custom or in-house file pickers for file selection.

Sync Outlook and Gmail Attachments

We’ve introduced a new property called sync_attachments, which can be specified when syncing via /integrations/gmail/sync and /integrations/outlook/sync endpoints. By default, this property is set to false.
Setting sync_attachments to true enables Carbon to automatically sync file attachments from corresponding emails. This includes not only traditional file attachments but also files (such as images) that are added in-line within emails.
Each file attachment will be assigned a unique file_id, with the parent_id corresponding to the email the file was attached to.
Please note that the same rules that apply to our file uploads also apply to attachments in terms of file size and supported extensions.

Set User File Limits

You have the flexibility to set the maximum number of files that a unique customer ID can upload using the file_upload_limit field on the update_users endpoint.
This value can be adjusted as needed, allowing you to tailor it according to your own plan limits.
Then you can check the upload limit set for a specific user via the custom_limits object on the user endpoint.
See details here.

Flags for OCR

Added ocr_job_started_at to the user_files_v2 response to denote whether OCR was enabled for a particular file.
Added additional OCR properties to be returned via ocr_properties, including whether table parsing was enabled.
See details here.

Role Management in Customer Portal

You now have the ability to manage who in your organization can create, delete, and view API keys.
Here’s a breakdown of the current roles available:
- Admin: This role is empowered to both create and delete API keys.
- User: Users with this role can view API keys.
Moving forward, these roles will determine user permissions and access across different sections of the Carbon Customer Portal.
You can access the customer portal via portal.carbon.ai

Expanded OCR Support in Carbon Connect

The prop useOCR can now be enabled on the integration level for the following connectors (in addition to local files):
- OneDrive
- Dropbox
- Box
- Google Drive
- Zotero
- SharePoint

The prop parsePdfTablesWithOcr can now be enabled on the integration level to parse tables with OCR when useOCR is set to true.
Please note OCR support is only applicable for PDFs at the moment.
You can find more details here.

Return `chunk_index` on the `/embeddings` Endpoint

We now return the chunk_index for specific chunks returned via the /embeddings endpoint.
You can find more details here.

Migrations between Embedding Models

You can now request migrations between embedding models with minimal downtime.
Email me if you’re interested. The cost per migration (not including embedding token costs) starts at $850 one-time.

New `request_id` Field

Carbon now accommodates the inclusion of a request_id within OAuth URLs, global sync endpoints, and custom sync endpoints (such as Gmail, Outlook, etc.), allowing users to define it as needed. Non-OAuth URL endpoints that auto-sync upon connection (e.g., Freshdesk, Gitbook) also supports this value. The request_id serves as a filter for files through user_files_v2.
With Carbon Connect, enabling the useRequestIds parameter to true will trigger automatic assignment of the request_id. This request_id will be returned in INITIATE and ADD/UPDATE callbacks.
- It’s essential to note that this configuration adjustment is applicable at the component level rather than the integration level.
- This enhancement is part of version 2.0.0-beta8.
- Find more details here.

March 2024

Read more ->

`syncFilesOnConnection` For More Data Sources

We’ve added the sync_files_on_connection parameter to the oauth_url endpoint for the following data sources: Intercom, Salesforce, Zendesk, Confluence, Freshdesk, and Gitbook.
This parameter is also accessible for each enabledIntegration in Carbon Connect. You can find more information about this here.
By default, this parameter is set to true. When enabled, all files will be synchronized automatically after a user connects their account. This is particularly useful when a user connects a data source that doesn’t have a built-in file picker.

Delete Child Files Based on Parent ID

Added a flag named delete_child_files to the delete_files endpoint. When set to true, it will delete all files that have the same parent_file_ids as the file submitted for deletion. This flag defaults to false.
Find more details here.

`upload_chunks_and_embeddings` Updates

You can now upload only chunks to Carbon via the upload_chunks_and_embeddings and we can generate the embeddings for you. This is useful for migrations where you want to migrate between embedding models and vector databases.
In the API request, you can exclude embeddings and set chunks_only to true. Then, include your embedding model API key (OpenAI or Cohere) under custom_credentials.

{ "api_key": "lkdsjflds" }

Make sure to include some delay between requests. There are also stricter limits on how many embeddings/chunks can be uploaded per request if chunks_only is true. Each request can only include 100 chunks.

Data Source Connections with Pre-Existing Auth

If you’re using our white labeling add-on, we added a new POST endpoint /integrations/connect so customers can bypass the authentication flow on Carbon by directly passing in an access token.
The request takes an authentication object that contains all the necessary pieces of data to connect to user’s account. The object will vary by data source and a list specifying the required keys can be found in our docs. If the connection is successful, the upserted data source will be returned.
This endpoint also returns a sync url for some data source types that will initiate the sync process.

Improvements to CSV, TSV, XLSX, GSheet Parsing

You have the option to now chunk CSV, TSV, XLSX, and Google Sheets by tokens via chunk_size and/or rows via max_items_per_chunk parameters. When a file is processed, we will add rows to a chunk until adding the next row would exceed chunk_size or max_items_per_chunk.
If a single row exceeds chunk_size or the embedding model’s limit for number of tokens, then the file’s sync_error_message will point out which row has too many tokens.
For example:

If each CSV row is 250 tokens, chunk_size of 800 tokens and no max_items_per_chunk set, then each chunk will contain 3 CSV rows.
If each CSV row is 250 tokens, chunk_size of 800 tokens and max_items_per_chunk set to 1, then each chunk will contain 1 CSV rows.
Consequently, it is essential to ensure that the number of tokens in a CSV row does not surpass the token limits established by the embedding models. Token counting is currently only supported for OpenAI models currently.
You can find more details here.

Improvements to OCR

Table parsing in PDFs has been improved significantly with this most recent OCR update.
In order to use the enhanced table parsing features, you need to set parse_pdf_tables_with_ocr to true when uploading PDFs (use_ocr must also be true).
- Any tables parsed when parse_pdf_tables_with_ocr is true have their own chunk(s) assigned to them. These chunks can be identified by the presence of the string TABLE in embedding_metadata.block_types.
- The format of these tabular chunks will be the same format as CSV-derived chunks.
- Using this table-parsing feature in conjunction with hybrid search should provide much better search results than before (assuming the PDF has tables that need to be searched over).
If you’re using OCR we now also return metadata such as coordinates and page numbers even if set_page_as_boundary is set to false.
- Specifically, we will return the bounding box coordinates as well as the start and end page number of the chunk.
- In the event that pg_start < pg_end, then you should interpret bounding box coordinates slightly differently. x1 and x2 will correspond to the minimum x1 and maximum x2 over all pages for the chunk. y1 will correspond to the upper-most coordinate of the part of the chunk on pg_start, and y2 will correspond to the bottom-most coordinate of the part of the chunk on pg-end.

Carbon Connect 2.0 (Beta)

We are thrilled to announce the beta launch of Carbon Connect 2.0, with the following improvements:

Support multiple active accounts per data source.
Improved data source list.
Built-in user interface for users to view and re-sync files per account.
Ability for users to directly disconnect active accounts.

To install Carbon Connect 2.0 please npm install carbon-connect@2.0.0-beta5. It is not treated as the latest version of Carbon Connect so you won’t get this version automatically.
Few other important updates for Carbon Connect 2.0:

We’ve made a change to remove file details from the payload of UPDATE callbacks. If you used to get files in this way, you’ll now need to switch to using our SDK or API to get the updated files when a data source updates.
When you’re specifying embedding models, just make sure to use the format like this: embeddingModel={EmbeddingGenerators.OPENAI_ADA_LARGE_1024}, instead of just writing out a string.
You can hide our built-in UI for viewing and re-syncing files using the showFilesTab param on either the global component or enabledIntegration level.

Scheduled Syncs Per User and Data Source

Control user and data source syncing using the /update_users endpoint, allowing organizations to specify enabled syncing for particular users and data source types. The endpoint accepts a list of user IDs and data sources, with an option to enable syncing for all sources using the string 'ALL'.
- Each request supports up to 100 customer IDs.
In the following example, future Gmail accounts for specified users will automatically have syncing enabled according to the provided settings.

{ "customer_ids": ["swapnil@carbon.ai", "swapnil.galaxy@gmail.com"], "auto_sync_enabled_sources": ["GMAIL"] }

Find more details in our documentation here.
Note: This update is meant to replace our file-level sync logic and any existing auto-syncs have been migrated over to use this updated logic.

Delete Files Based on Filters

We added the /delete_files_v2 endpoint which allows customers to delete files via the same filters as /user_files_v2
We plan to deprecate the /delete_files endpoint in a month.
Find more details in our documentation here.

Filtering for Child Files

We added the ability to include all descendent (child) files on both /delete_files_v2 and /user_files_v2 when filtering.
Filters applied to the endpoint extend to the returned child files.
We plan to deprecate the parent_file_ids filter on the /user_files_v2 endpoint in a month.

Customer Portal v1

We’ve officially launched v1 of our Customer Portal - portal.carbon.ai
You can currently manage your API keys directly via the Portal, and we plan to release the following functionality next quarter:
- User management
- Usage monitoring
- Billing management
For current customers, you can reset your password with the email provided to Carbon to gain access. If you don’t know the email you have on file, DM me!

`integration/items/list` Improvements

We are implementing four distinct filters: external_ids, ids, root_files_only, and name, each meant to filter data based on their respective fields.
- The root_files_only filter will exclusively return top-level files. However, if a parent_id is specified, then root_files_only can’t be specified and vice versa.
The external_url has been added to the response body of the integrations/items/list endpoint.
See more details here.

Multiple Active Accounts Per Data Source

Carbon now support multiple active accounts per data connection!
We’ve introduced two new parameters across various API endpoints to support this functionality across all our connectors. While these parameters are optional for users with a single data source of each type, they become mandatory when managing multiple accounts.
- /integrations/oauth_url
  - data_source_id: Specifies the data source from which synchronization should occur when dealing with multiple data sources of the same type.
  - connecting_new_account: This parameter is utilized to consistently generate an OAuth URL as opposed to a sync URL. A sync URL is the destination where users are redirected after a successful OAuth authentication to synchronize their files. While this parameter can be skipped when adding the first data source of that type, it should be explicitly specified for subsequent additions.
- /integrations/s3/files, /integrations/outlook/sync, /integrations/gmail/sync
  - data_source_id: Used to specify the data source for synchronization when managing multiple data sources of the same type.
- /integrations/outlook/user_folders, /integrations/outlook/user_categories, /integrations/gmail/user_labels
  - data_source_id: Specifies the data source to be utilized when there are multiple data sources of the same type.
Note that the following endpoints already have a mandatory requirement to pass in a data_source_id: /integrations/items/sync,/integrations/items/list,/integrations/files/sync/,integrations/gitbook/spaces,/integrations/gitbook/sync

February 2024

Read more ->

New Embedding Models

We now support embedding generation using OpenAI’s text-embedding-3-small and text-embedding-3-large models.
To define the embedding model, utilize the embedding_model parameter in the POST body for the /embeddings and other API endpoints. By default, if no specific model is provided, the system will use OPENAI (the original Ada-2).
Find more details on the models available here.

Return HTML for Webpages

presigned_url field under user_files_v2 now returns a pre-signed URL to the raw HTML content for each web page.
parsed_text_url field still returns a pre-signed URL for the corresponding plain text.
Find more details here.

Return Website Tags in File Metadata

file_metadata field under user_files_v2 now returns og:image and og:description for each web page.
Find more details here.

Omit Content by CSS Selector

You can now exclude specific CSS selectors from web scraping. This ensures that text content within these elements does not appear in the parsed plaintext, chunks, and embeddings. Useful for omitting irrelevant elements, such as headers or footers, which might affect semantic search results.
The web_scrape request objects supports a new fields:
css_selectors_to_skip: Optional[list[str]] = []
Find more details here.

JSON File Support

We’ve added support for JSON files via local upload and 3rd party connectors.
How It Works:
- The parser iterates through each object in a file and flattens it. Keys on the topmost level remain the same, but nested keys are transformed into the dot separated path to reach the key’s value. Each component of the path can either be a string for a nested object or integer for a nested list.
- max_items_per_chunk is a parameter that determines how many JSON objects to include in a single chunk.
- A new chunk is created if either the max_items_per_chunk and chunk_size limit is reached. For example:
  - If each JSON object is 250 tokens, chunk_size of 800 tokens and no max_items_per_chunk set, then each chunk will contain 3 JSON objects.
  - If each JSON object is 250 tokens, chunk_size of 800 tokens and max_items_per_chunk set to 1, then each chunk will contain 1 JSON object.
Learn more details here.

Gitbook Connector

We launched our Gitbook integration today that syncs pages from any public and shared spaces.
The Carbon Connect enabledIntegrations value for Gitbook is GITBOOK.
Gitbook does not come with a pre-built file selector so we added 2 endpoints for listing and syncing Gitbook spaces:
- List all Gitbook spaces with /integrations/gitbook/spaces (API Reference)
- Sync multiple spaces at once with integrations/gitbook/sync (API Reference)
You can also use our global endpoints for listing and syncing specific pages in Gitbook spaces:
- List pages in spaces with the global endpoints /integrations/items/list
- Sync pages in spaces with the global endpoint /integrations/files/sync
- Note: Spaces are treated like folders via the Carbon API.
See more specifics about our Gitbook integration here.
Note: our Gitbook page parser is still in beta so feedback is much appreciated!

Delete Endpoint Update

We’re transitioning file deletion from sync to async processing.
This means that the FILE_DELETED webhook event will not fire immediately and instead fire when the file is actually deleted.
We are also limiting 50 files to be deleted per /delete_files request to limit the load on our servers. We advise spacing out delete requests every 24 hours.

Pinecone Integration

We’ve launched our Pinecone destination connector! We offer support for both pod-based and serverless offerings.
Carbon seamlessly updates your Pinecone instance with the latest embeddings upon processing user files. Users gain full access to Carbon’s API endpoints, including hybrid search for supported sparse vector storage.
Find more details here.

New Carbon SDKs

Moving forward, we will be able to provide support for a greater number of SDKs and promptly release SDK support for API updates. If there is a language for which you want us to add SDK support, we should be able to turn that around in less than a week.
We’re adding support for the following languages today:
The current Javascript SDK will continue to be supported for the next month, and it will be available longer term. However, new features that are introduced will only be supported in the new Typescript SDK moving forward.

Delete Users Endpoint

Added an endpoint /delete_users that takes an array of customer IDs and deletes all those users.
Deleting a user revokes all of the user’s oauth connections and deletes all their files, embeddings and chunks.
The request format is:

{ "customer_ids": ["USER_1", "USER_2", "USER_3"] }

Find more details here.

Salesforce Connector is Live

All articles from an end user’s Salesforce Knowledge can be listed and synced via the global API endpoint /integrations/items/list and /integrations/files/sync.
The Carbon Connect integration (launching tomorrow) will sync all articles by default.
The enabledIntegrations value is SALESFORCE.
You can find more info here.

Outlook Folders

After connecting your Outlook account, you can use this endpoint to list all of your folders on outlook.
This includes both system folders like inbox and user-created folders.
Find more details here.

Gmail Labels

After connecting a Gmail account, you can use the /integrations/gmail/user_labels endpoint to list all of your labels.
User created labels will have the type user and Gmail’s default labels will have the type system.
Find more details here.

Delete Child Files Based on Parent ID

Added a flag named delete_child_files to the delete_files endpoint. When set to true, it will delete all files that have the same parent_file_ids as the file submitted for deletion. This flag defaults to false.
Find more details here.

Carbon Connect Updates

Added support for JSON file formats and maxItemsPerChunk param to specify the number of items to include in a specific chunk.
Added cssSelectorsToSkip to WEB_SCRAPE to define CSS Selectors to exclude when converting HTML to plaintext.
Added SALESFORCE as an enabledIntegration on Carbon Connect.
For Salesforce, we added a param syncFilesOnConnection that defaults to true and will automatically sync all pages from a user’s Salesforce account.
We’ll be adding this param to other connectors too, meaning you can automatically sync all files from connectors that don’t have built-in file selectors (Gitbook, Confluence, etc).
This parameter is also added to the /integrations/oauth_url endpoint as sync_files_on_connection and also defaults to true.

January 2023

Read more ->

Freshdesk Connector is Live

All Published articles from an end user’s Freshdesk knowledge base are synced when connected to Carbon.
The Carbon Connect enabledIntegrations value is FRESHDESK.
You can find more info here.

Speed Improvements to Hybrid Search

We improved the speed of hybrid search by a factor of 10x by creating sparse vector indexes on file upload vs. query time.
- Steps to Enable:
  - Pass the following body to the /modify_user_configuration endpoint: { "configuration_key_name": "sparse_vectors", "value": { "enabled": true } }
- Set the parameter generate_sparse_vectors to true via the /uploadfile endpoint.
We’ll be rolling out faster hybrid search support across 3rd party connectors in the upcoming weeks.
Find more details here and here.

Deleting Files based on Sync Status

You can now delete file(s) based on sync_status via the delete_files endpoint.
We added 2 parameters:
- sync_statuses - parameter to pass a list of sync statuses for file deletion.
  - For example, { "sync_statuses": ["SYNC_ERROR", "QUEUED_FOR_SYNC"] }. When this parameter value is passed we will delete all files in the SYNC_ERROR and QUEUED_FOR_SYNC status that belong to the end user identified by customer-id in headers that made the request.
- delete_non_synced_only - boolean parameter that limits deletion to files that have not been re-synced before.
  - For example, a previously synced Google Drive file enters the QUEUED_FOR_SYNC status again during a scheduled re-sync. Setting delete_non_synced_only to true would prevent this file from being deleted as well.
Files are deletable in all statuses except SYNCING, EVALUATING_RESYNC and QUEUED_FOR_OCR states. Including SYNCING, EVALUATING_RESYNC, QUEUED_FOR_OCR in the list will result in an error response - files in these statuses must wait until they transition out of the status to be deleted.
Find more details here.

Carbon Connect Updates

Added support for the following functionalities in Carbon Connect (React component + JavaScript SDK):
- Additional embedding models (OPENAI, AZURE_OPENAI, COHERE_MULTILINGUAL_V3 for text and audio files, and VERTEX_MULTIMODAL for image files).
- Enable audio and image file support. Reference documentation on file formats available.
- OCR support for PDFs from local file uploads via Carbon Connect.
- Hybrid search supported.

You can find details to enable any of these functionalities in our documentation:
- React Component
- Javascript SDK

Remove `Customer-Id` on Select Endpoints

We’re removing customer-id as a required header for the following endpoints where it is not required:
- /auth/v1/white_labeling
- /user
- /webhooks
- /add_webhook
- /delete_webhook/{webhook_id}
- /organization

Vector Database Integration

We are starting to build out direct integrations with vector database providers!
What this means:
- After authenticating a vector database provider via API key, Carbon automatically synchronizes between user data sources and the embeddings within your vector database. Whenever a user file is processed, we handle the seamless update of your vector database with the latest embeddings.
- You’ll have full functionality to all our Carbon’s API endpoints, including hybrid search if sparse vector storage is supported by your vector database.
- Migrations between vector databases is made simple since Carbon provides a unified API to interface with all providers.
The first vector database integration we’re announcing is with Turbopuffer. Many more to come!

December 2023

Read more ->

S3 Connector

We launched our S3 connector today that enables syncing objects from buckets.
The Carbon Connect enabledIntegrations value for S3 is S3.
See more specifics about our S3 connector here.

File + Account Management Component (BETA)

We’ve launched a new component that enables the following:

Users to add and revoke access to accounts under each connection.
Users to view and select specific folders and files for sync.

The aim is to offer a pre-built file selector for integrations without their own.
The component is currently offered in React but we’ll add support for other frameworks soon.
You can find the npm package here. Please note it’s still in BETA so your feedback is much appreciated!

Expanding sort for `user_files_v2`

You can sort by name, file_size and last_sync on order_by field in the user_files_v2 body.
See more details here.

Support for audio file uploads via connectors

We’ve enabled support for audio files via the following connectors: S3, Google Drive, Onedrive, SharePoint, Box, DropBox, Zotero.
See list of supported audio files here.

Google Verification

Carbon’s Google Connector is officially Google-verified. This means users will no longer see the warning screen when authenticating with Carbon’s Google connector.

OCR Public Preview

We’ve been rolling out support for OCR, starting with PDFs uploaded locally (images and data connectors to follow).

Exposing Sync Error Reasons

We are now exposing error messages under the sync_error_reason field for files entering SYNC_ERROR status.
You can find a list of common errors here and we’ll be updating this on an ongoing basis.

List and Sync Items from Data Sources

We’re introducing new functionalities that allow customers to synchronize and retrieve a comprehensive list of items such as files, folders, collections, articles, and more from a user’s data source. This enhancement empowers you to create an in-house file selection flow, while enabling Carbon to also provide a user-friendly file selector UI and convenient helper methods within our SDK.
You can find more details here.

Upload Chunks and Embeddings

Added /upload_chunks_and_embeddings endpoint to enable uploading of chunks and vectors to Carbon directly.
See more specific details here.

CARBON

Data Connectors for LLMs

LETS CHAT!

team@carbon.ai

New Webhook Events

GitHub Connector

Set Max Files Per Upload

Add include_all_children to Embeddings Endpoint

In-House File Picker for Confluence and Salesforce

Hiding 3rd-Party File Picker

Sync Outlook and Gmail Attachments

Set User File Limits

Flags for OCR

Role Management in Customer Portal

Expanded OCR Support in Carbon Connect

Return chunk_index on the /embeddings Endpoint

Migrations between Embedding Models

New request_id Field

syncFilesOnConnection For More Data Sources

Delete Child Files Based on Parent ID

upload_chunks_and_embeddings Updates

Data Source Connections with Pre-Existing Auth

Improvements to CSV, TSV, XLSX, GSheet Parsing

Improvements to OCR

Carbon Connect 2.0 (Beta)

Scheduled Syncs Per User and Data Source

Delete Files Based on Filters

Filtering for Child Files

Customer Portal v1

integration/items/list Improvements

Multiple Active Accounts Per Data Source

New Embedding Models

Return HTML for Webpages

Return Website Tags in File Metadata

Omit Content by CSS Selector

JSON File Support

Gitbook Connector

Delete Endpoint Update

Pinecone Integration

New Carbon SDKs

Delete Users Endpoint

Salesforce Connector is Live

Outlook Folders

Gmail Labels

Delete Child Files Based on Parent ID

Carbon Connect Updates

Freshdesk Connector is Live

Speed Improvements to Hybrid Search

Deleting Files based on Sync Status

Carbon Connect Updates

Remove Customer-Id on Select Endpoints

Vector Database Integration

S3 Connector

File + Account Management Component (BETA)

Expanding sort for user_files_v2

Support for audio file uploads via connectors

Google Verification

OCR Public Preview

Exposing Sync Error Reasons

List and Sync Items from Data Sources

Upload Chunks and Embeddings

Add `include_all_children` to Embeddings Endpoint

Return `chunk_index` on the `/embeddings` Endpoint

New `request_id` Field

`syncFilesOnConnection` For More Data Sources

`upload_chunks_and_embeddings` Updates

`integration/items/list` Improvements

Remove `Customer-Id` on Select Endpoints

Expanding sort for `user_files_v2`