Am I the only one that got, "This article smells like it was written by an AI told to 'compare these two products'"?
Something around the sentence structure just is offputting.
killme2008 48 minutes ago [-]
The author is not a native speaker; I promised it's not an AI article but with some minor reviews from AI :)
atombender 5 hours ago [-]
How does Greptime handle dynamic schemas where you don't know most of the shape of the data upfront?
Where I work, we have maybe a hundred different sources of structured logs: Our own applications, Kubernetes, databases, CI/CD software, lots of system processes. There's no common schema other than the basics (timestamp, message, source, Kubernetes metadata). Apps produce all sorts JSON fields, and we have thousands and thousands of fields across all these apps.
It'd be okay to define a small core subset, but we'd need a sensible "catch all" rule for the rest. All fields need to be searchable, but it's of course OK if performance is a little worse for non-core fields, as long as you can go into the schema and explicitly add it in order to speed things up.
Also, how does Greptime scale with that many fields? Does it do fine with thousands of columns?
I imagine it would be a good idea to have one table per source. Is it easy/performant to search multiple tables (union ordered by time) in a single query?
killme2008 4 hours ago [-]
Thanks for your question.
GreptimeDB, like MongoDB, is schemaless. When ingesting data via OTEL or its gRPC SDKs, it automatically creates tables by inferring the schema and dynamically adds new columns as needed.
Secondly, I prefer wide tables to consolidate all sources for easy management and scalability. With GreptimeDB's columnar storage based on Parquet, unused columns don't incur storage costs.
atombender 3 hours ago [-]
Thanks, that seems promising. So much of the documentation is schema-oriented, I didn't see that it supported dynamic schemas.
I find it interesting that Greptime is completely time-oriented. I don't think you can create tables without a time PK? The last time I needed log storage, I ended up picking ClickHouse, because it has no such restrictions on primary keys. We use non-time-based tables all the time, as well as dictionaries. So it seems Greptime is a lot less flexible?
killme2008 3 hours ago [-]
Yes, GreptimeDB requires a time index column for optimized storage and querying. It's not a constraint of a primary key, but just an independent table constraint.
Could you elaborate on why you find this inconvenient? I assumed logs, for example, would naturally include a timestamp.
firesteelrain 6 hours ago [-]
Any reason to use this like in Azure over their cloud native options such as with AKS that has fluentd built into the ama-pod? It already sends logs to Azure Monitor/LogA. Azure Managed Grafana can take in Kusto queries. AMA can monitor VMs. Further you can use DCE/DCRs for custom logs. Azure provides Azure native ElasticSearch too. It seems to own this market.
You can predictably control costs and predict costs with these models.
killme2008 4 hours ago [-]
Agree. Leveraging capabilities provided by cloud vendors is always a good idea. However, as the scale grows, cost inevitably becomes an issue. Third-party solutions often offer cost advantages because they support multi-cloud deployments and are optimized for specific scenarios.
client4 5 hours ago [-]
For logs I'd be more likely to choose https://www.gravwell.io as it's log agnostic and I've seen it crush 40Tb/s a day, whereas it looks like greptime is purpose-tuned for metrics and telemetry data.
dijit 4 hours ago [-]
is gravwell open source?
(it seems greptime is.)
chreniuc 5 hours ago [-]
How does it compare to openobserve?
7 hours ago [-]
reconnecting 5 hours ago [-]
I'm always skeptical toward software companies with an outdated year in the footer.
killme2008 4 hours ago [-]
Thanks for pointing it out! The footer has been updated.
Something around the sentence structure just is offputting.
Where I work, we have maybe a hundred different sources of structured logs: Our own applications, Kubernetes, databases, CI/CD software, lots of system processes. There's no common schema other than the basics (timestamp, message, source, Kubernetes metadata). Apps produce all sorts JSON fields, and we have thousands and thousands of fields across all these apps.
It'd be okay to define a small core subset, but we'd need a sensible "catch all" rule for the rest. All fields need to be searchable, but it's of course OK if performance is a little worse for non-core fields, as long as you can go into the schema and explicitly add it in order to speed things up.
Also, how does Greptime scale with that many fields? Does it do fine with thousands of columns?
I imagine it would be a good idea to have one table per source. Is it easy/performant to search multiple tables (union ordered by time) in a single query?
Secondly, I prefer wide tables to consolidate all sources for easy management and scalability. With GreptimeDB's columnar storage based on Parquet, unused columns don't incur storage costs.
I find it interesting that Greptime is completely time-oriented. I don't think you can create tables without a time PK? The last time I needed log storage, I ended up picking ClickHouse, because it has no such restrictions on primary keys. We use non-time-based tables all the time, as well as dictionaries. So it seems Greptime is a lot less flexible?
Could you elaborate on why you find this inconvenient? I assumed logs, for example, would naturally include a timestamp.
You can predictably control costs and predict costs with these models.
(it seems greptime is.)