Blob#
- class langchain_core.documents.base.Blob[source]#
Bases:
BaseMedia
Blob represents raw data by either reference or value.
Provides an interface to materialize the blob in different representations, and help to decouple the development of data loaders from the downstream parsing of the raw data.
Inspired by: https://developer.mozilla.org/en-US/docs/Web/API/Blob
Example: Initialize a blob from in-memory data
from langchain_core.documents import Blob blob = Blob.from_data("Hello, world!") # Read the blob as a string print(blob.as_string()) # Read the blob as bytes print(blob.as_bytes()) # Read the blob as a byte stream with blob.as_bytes_io() as f: print(f.read())
Example: Load from memory and specify mime-type and metadata
from langchain_core.documents import Blob blob = Blob.from_data( data="Hello, world!", mime_type="text/plain", metadata={"source": "https://example.com"} )
Example: Load the blob from a file
from langchain_core.documents import Blob blob = Blob.from_path("path/to/file.txt") # Read the blob as a string print(blob.as_string()) # Read the blob as bytes print(blob.as_bytes()) # Read the blob as a byte stream with blob.as_bytes_io() as f: print(f.read())
- param data: bytes | str | None = None#
Raw data associated with the blob.
- param encoding: str = 'utf-8'#
Encoding to use if decoding the bytes into a string.
Use utf-8 as default encoding, if decoding to string.
- param id: str | None = None#
An optional identifier for the document.
Ideally this should be unique across the document collection and formatted as a UUID, but this will not be enforced.
Added in version 0.2.11.
- param metadata: dict [Optional]#
Arbitrary metadata associated with the content.
- param mimetype: str | None = None#
MimeType not to be confused with a file extension.
- param path: str | PurePath | None = None#
Location where the original content was found.
- as_bytes_io() Generator[BytesIO | BufferedReader, None, None] [source]#
Read data as a byte stream.
- Return type:
Generator[BytesIO | BufferedReader, None, None]
- classmethod from_data(data: str | bytes, *, encoding: str = 'utf-8', mime_type: str | None = None, path: str | None = None, metadata: dict | None = None) Blob [source]#
Initialize the blob from in-memory data.
- Parameters:
data (str | bytes) – the in-memory data associated with the blob
encoding (str) – Encoding to use if decoding the bytes into a string
mime_type (str | None) – if provided, will be set as the mime-type of the data
path (str | None) – if provided, will be set as the source from which the data came
metadata (dict | None) – Metadata to associate with the blob
- Returns:
Blob instance
- Return type:
- classmethod from_path(path: str | PurePath, *, encoding: str = 'utf-8', mime_type: str | None = None, guess_type: bool = True, metadata: dict | None = None) Blob [source]#
Load the blob from a path like object.
- Parameters:
path (str | PurePath) – path like object to file to be read
encoding (str) – Encoding to use if decoding the bytes into a string
mime_type (str | None) – if provided, will be set as the mime-type of the data
guess_type (bool) – If True, the mimetype will be guessed from the file extension, if a mime-type was not provided
metadata (dict | None) – Metadata to associate with the blob
- Returns:
Blob instance
- Return type:
- property source: str | None#
The source location of the blob as string if known otherwise none.
If a path is associated with the blob, it will default to the path location.
Unless explicitly set via a metadata field called “source”, in which case that value will be used instead.