Calmcode - vector: cloud storage

How to Route Logs to Cloud Storage, or S3, with Vector

1 2 3 4 5 6

Let's update our configuration to also send logs to a cloud storage layer. We'll use Google Cloud Storage in this demo, but there are many alternative storage layers we can use as a sink.

Update the Config

To handle this, we'll add another sink to our configuration that can send the events to Google Cloud Storage. We'll call it gcloud. All the relevant details for our new gcloud component can be found on the (Vector docs](https://vector.dev/docs/reference/configuration/sinks/gcp_cloud_storage/).

[sources.in]
type = "stdin"

[transforms.parse]
inputs = [ "in"]
type = "remap"
source = '''
  .message = parse_json(.message) ?? {}
'''

[transforms.filtered]
inputs=["parse"]
type="filter"
condition = 'length(.message) > 0 ?? true'

[sinks.file]
inputs = ["filtered"]
type = "file"
compression = "none"
encoding.codec = "ndjson"
path = "/tmp/vector-%Y-%m-%d.log"

[sinks.out]
inputs = ["filtered"]
type = "console"
encoding.codec = "text"

[sinks.gcloud]
inputs = ["filtered"]
type = "gcp_cloud_storage"
bucket = "vector-demo-logs"
credentials_path = "credentials.json"
key_prefix = "date=%F/"
compression = "none"
encoding.codec = "ndjson"

[sinks.gcloud.buffer]
type = "memory"

There's a few things worth noting.

  1. Google Cloud requires that you pass a file with credentials as an authentication mechanism. That's configured in the credentials_path setting.
  2. We are going to be using the vector-demo-logs bucket. In this bucket we'll create a folder that contains a formatted date as defined in the key_prefix setting.
  3. These logs are going to be buffered. The base setting assumes that data is batched per 5 minutes, assuming the buffer isn't full. More details on this mechanic can be found on the docs.

If you want to send the data to S3, you'll want to explore the sink documentation here.

Repeating the Experiment

If you'd like to repeat the experiment, you may be interested in using the watch command line tool to repeat the requests sent to the server.