Basics
What is the relationship between Bacalhau and Expanso?
Bacalhau is an open-source, distributed computing platform that runs compute jobs close to where data is located. It was created by Expanso, an enterprise software company, to address challenges with processing large-scale datasets efficiently, securely and inexpensively.
Expanso commercially supports and extends the capabilities of the Bacalhau platform. It offers managed services that help businesses adopt, operate and integrate Bacalhau for production use across their infrastructure environments including on-prem, edge, and clouds.
Some key aspects of the relationship:
- Bacalhau is an open-source software platform and is available on Github.
- Expanso develops and maintains the trademark and binaries for Bacalhau.
- The core software, controls, and programming model that make up Bacalhau were created by Expanso, but are contributed to by many people in the Bacalhau ecosystem.
- Expanso provides verified binaries, commercial support, SLAs, training and professional services to assist customers in leveraging Bacalhau's full capabilities.
- Expanso engineering actively contributes new features and enhancements to the open source Bacalhau codebase based on production deployments.
In summary, Bacalhau is the open-source software platform, while Expanso drives its development and offers commercial support for enterprise usage of the technology.
What problem does Expanso solve?
Expanso offers a distributed platform designed to address the challenges associated with working on big data in an increasingly distributed world. The team at Expanso built and leverages the open-source project Bacalhau to make big data processing faster, cheaper, and more secure.
Expanso aims to solve several key problems that organizations face when working with big data:
Some key aspects of the relationship:
- Distributed data: Traditional data systems are centralized, but in today’s world, data is generated across multiple devices and processed in various locations. Expanso ensures that jobs run where the data is created, reducing the need to move raw data, decreasing data transfer costs, and preventing redundant storage.
- Management oversight: Managing and monitoring distributed jobs can be a headache. Expanso not only ensures these jobs run smoothly but also gives users a clear view of the status of all their tasks.
- Cost reduction: Expanso cuts down on data movement and bandwidth costs. Further, by tapping into unused computing power, it can reduce the need to create new infrastructure just for data processing, and driving down expenses.
- Flexible scalability: Data demands can vary. Expanso can dynamically adjust job sizes, offering businesses the flexibility to balance processing time and costs.
- Regulatory event avoidance: Moving data can require navigating complex regulatory environments, something even big businesses might not see coming. Expanso limits or eliminates data movement altogether and provides detailed audit logs. Additionally, teams can sanitize, filter, or eliminate sensitive data, reducing the likelihood of unintentionally transferring or storing data that can trigger penalties like personally identifiable information (PII).
Who should use Expanso?
Expanso is designed to tackle big data challenges across a spectrum of sectors and applications.
Here are just some of the scenarios where Expanso is valuable:
Some key aspects of the relationship:
- Processing “immobile” data: If you're into data engineering, machine learning, analytics, or scientific computing, where managing petabytes of data is the norm, processing data in place will be required to gain insights quickly and reliably.
- Dealing with distributed data sources: Expanso is well-suited for topologies where data is created on 10 or more devices, or 100GB+ in aggregate. There are many such scenarios, but some examples include expansive IoT networks or virtual machines spread across zones and regions.
- For navigating data privacy & security: Many industries have strict regulatory compliance around data movement. Expanso is built to enable processing over that data with full recognition of the requirements of data gravity.
- Faster information from the edge: With more devices and information coming in from the network fringe, Expanso can help provide real-time analytics and insights on devices, with arbitrary query and execution capabilities.
- Reducing data transfer and storage costs: Expanso offers multiple ways to reduce data transfer and storage costs. By only moving the data required, organizations can reduce the cost in excessive data transfer and storage.
In essence, Bacalhau is your go-to for handling the complexities of distributed big data. And with Expanso backing it, you get the added perks of validated binaries, detailed security build information, and an SLA for business critical support.
How is Expanso being used?
Expanso users and customers are now able to perform:
- Improved Log Management: Companies are using Bacalhau to handle ever expanding application logs. Through Expanso, users not only manage this data but also filters out sensitive information, and allows users to glean insights using SQL queries or regex across different services.
- Machine Learning at the Edge:
- Training: Bacalhau supports machine learning training directly on remote edge devices, reducing the need for centralization of all data before building a model.
- Inference: Models are sent to edge devices for accurate real-time predictions. The platform handles both batch and long-running inference tasks, including allowing for data pre-processing, and enabling regular model updates.
- Distributed Data Warehousing: By acting as an intermediary, Bacalhau can run SQL queries over multiple data sources, essentially crafting a unified virtual data warehouse.
- Infrastructure Insight: Organizations are tapping into OSQuery through Bacalhau to conduct on-the-spot queries on devices and machines. This capability is further boosted with the ability to execute arbitrary commands on these devices without requiring remote shell access, enabling more and reliable fleet monitoring and management.
- Tackling Geographically Distributed Data Files: For enterprises with files scattered across different zones, regions, or cloud providers, Bacalhau allows data processing close to geographically distributed buckets. This accelerates traditional ETL processes, leading to quicker insights and improved data management.
- Resilient in Complex Networking Environments: In topologies such as edge or IoT where network connections are unstable, Bacalhau ensures that jobs are executed without reliable, courtesy of Bacalhau’s decentralized queuing and coordination.
Cross-Organizational Machine Learning: In industries where data regulations prevent sharing of data even for just model training, organizations lose out on the ability to group data together for more accurate models. With Bacalhau, teams can collaborate and train models, while providing audit logs and restricted permissions, enabling stringent data oversight, and without the need to exchange raw data.
How is Expanso priced?
Pricing varies per deployment taking into consideration aspects like on-prem vs. cloud, number of clusters, and the preferred hosting model. Please contact us to discuss pricing options.
Is Expanso available today?
Yes, Bacalhau is in general availability and is on a quarterly release schedule. Expanso releases Bacalhau builds weekly, and full version releases once a quarter.
I have some questions. What’s the easiest way to connect?
Reaching out is simple! Here are some options:
- Send us a message via our contact form
- Join our slack channel. You can chat with the Bacalhau community or message us directly.
If you prefer a face-to-face approach, hop into our bi-weekly office hour hosted by our Expanso Team. It’s a live Q&A, making it a great opportunity to ask detailed questions about Bacalhau and Expanso and get connected to the right person.
Deployment
Is Expanso available on cloud hosting platforms?
Expanso’s Bacalhau platform offers robust versatility in deployment.
- On-premise & cloud flexibility: Bacalhau can be seamlessly set up both (or either) on local infrastructure or cloud environments, giving users the liberty to choose their preferred deployment avenue.
- Cloud-agnostic approach: The platform's design doesn't favor any specific cloud provider. Instead, it’s structured to execute tasks across various public cloud environments. Whether you're using AWS, GCP, Azure, or any other provider, Bacalhau is equipped to handle cross-region and cross-account task execution on your existing infrastructure.
- Autonomous in disconnected scenarios: Even in settings where cloud connectivity might be intermittent, Bacalhau's distributed model ensures consistent and uninterrupted operations.
- Networking compatibility: Bacalhau's capabilities aren’t limited by networking constraints. As long as Bacalhau nodes are network-accessible to each other, whether it's through private networking, VPNs, or other secure channels, it operates efficiently. This implies that users can harness global networks of specialized instance types (like Compute Optimized or GPU-enabled) provided by cloud services like AWS, GCP, and Azure. This facilitates running analytics tasks in proximity to cloud-hosted data, all while simultaneously operating their on-site setups.
In essence, Expanso bridges the gap between on-premises and cloud infrastructures, enabling users to capitalize on the best of both worlds and optimize their big data management and analytics processes.
Can we deploy Expanso in our on-premises datacenter or cluster, apart from the cloud?
Expanso offers full support for on-premise setups separate from cloud deployments.
- Built for private environments: The platform is consciously built to cater to organizations that prefer or require keeping their operations internal. This ensures data security, control, and regulatory compliance for sensitive industries or tasks.
- Distributed & autonomous: Bacalhau’s distributed design thrives even in isolated settings. As a result, Bacalhau can run tasks unhindered even in completely air-gapped environments.
- End-to-end support: Expanso isn’t just about software; they extend full-fledged services to facilitate the deployment process. From the initial setup, integration with existing systems, to ongoing operations and governance, Expanso’s platform and team ensures everything aligns with the specific needs of isolated deployments.
In a nutshell, if an organization desires to keep its big data operations in-house and independent of external networks, Expanso’s Bacalhau platform is fully equipped and designed to accommodate such requirements.
Security
Does Expanso persist my data?
At Expanso, your data security stands as the cornerstone of our vision.
- Self-hosted Bacalhau platform: Bacalhau operates under a non-persistence policy. It does not, by default, retain or store any customer data. The system’s design and architecture emphasize data decentralization and sovereignty, ensuring that clients maintain full autonomy and control over their data, irrespective of the environment.
- Update checks: To ensure that users get the latest features and security patches, Bacalhau performs daily checks with Expanso's servers to notify if there are any updates. This interaction is purely for software version checking, and no personal data or specific details about your deployment are transmitted or stored. For users who prefer not to have these checks, they can be easily disabled. To do so, simply follow the instructions provided in here.
- Hosted Bacalhau service: When customers choose to use Expanso's hosted Bacalhau service or allow Expanso to monitor their servers for them, Expanso respects data boundaries, meaning it does not access, utilize, or distribute customer data without explicit consent. Expanso retains information only in the context of running the servers for the purpose of ensuring reliable service. Exceptions are only made in scenarios of fraud prevention, abuse mitigation, or legal compliance.
In essence, Expanso is committed to ensuring that user data remains secure, private, and within the user’s control. Customers can trust the platform to uphold the highest standards of data sovereignty. Read more in our Privicy Policy.
Create a more robust business today
Ready to get started?
Ready to get started?
Looking for something else? Check out our Help Center.