Maintenance and Debugging

Note

This information can be used by STRATO node administrators.

Node Health and Maintenance

Monitoring STRATO health on the node & network levels helps discover issues proactively, based on the warning alerts.

Automated STRATO Health Monitoring

BlockApps recommends an automated STRATO network health monitoring. Consider using a 3^rd party uptime monitoring software (like NodePing, CloudRadar Monitoring, Scalyr, AWS CloudWatch etc.) to periodically poll the health endpoint of Central Monitoring Dashboard's API, and setting up the alerts.

Health Checks

STRATO Health checks are

Node-level health checks include:
- RAM usage (warning alert triggered on usage of 80% or more)
- Disk usage (warning alert triggered on disk space usage of 80% or more)
- STRATO Vault in-memory password is set (warning alert triggered if unset)
- Any STRATO Docker containers are unhealthy or down
- Other
Network-level health checks include:
- All monitored nodes are healthy
- None of the monitored nodes have active warning alerts
- All monitored nodes are in sync (latest block, sequence and round numbers match) (minor difference in numbers is admissible.)
- The network does not have any queued transactions that can not be processed ("stalled network".)

Based on the warning alerts, a STRATO Node's administrator has a time window to prevent the potential problems caused by such issues like low disk space or low system memory.
Permanently monitoring the network-level health (CMD's GET /health endpoint) gives an administrator a peace of mind and an overall status of STRATO Network.

Debugging

The issues with a STRATO Node and Network should be investigated and debugged following the steps below

Identify Problematic Nodes

These steps should be used to identify which nodes are causing problems on the network level.

Check CMD's health public endpoint: GET https://<cmd_address>/health
- Check for needsAttention value being true. Proceed to next step if true, otherwise in case if you experience issues, proceed to Node Debugging.
Check CMD's health protected endpoint: GET https://<cmd_address>/server/health (requires authentication - sign-in at https://
/ to obtain the session cookie in your browser first.)
- Check for warning messages (warnings array). The warning messages describe the issue and show which nodes have the problem.
- Check for any unhealthy nodes listed (unHealthyNodes array). This provides the information on which of the monitored nodes are unhealthy. Proceed to Node Debugging.
- Check for isOverallStalling value being true. This usually indicates the problem with consensus in the STRATO network.
  - Make sure the (½)n + 1 validator nodes are up and running healthy.
  - Make sure the nodes can communicate with each other on ports 30303/TCP and 30303/UDP (check your private network configurations)
  - Proceed to Node Debugging to look for the P2P issues in the logs

Node Debugging

These steps should be used to identify the problems on the node level (if not identified on the network level.)

Check STRATO's health protected endpoint: GET https://<strato_address>/apex-api/health (requires authentication - sign-in at https:/// to obtain the session cookie in your browser first.)
Check the Cloud Monitoring status of you VM (e.g. AWS CloudWatch and Instance Status Checks on AWS)
SSH to the machine and try the following steps:
1. Check system health
  - Check CPU and RAM usage (with htop or top)
  - Check disk usage (with df -h)
2. Check the Docker container health statuses (docker ps and look for health statuses in parentheses)
  - In case if there is one or multiple of the containers unhealthy, check their logs (e.g. with docker logs --tail=100 <container_name>). This should output the message indicating the problem with the container's health.
  - In most cases restarting the container (with docker restart <container_name>) does the trick. Doing it is safe in terms of data persistence.
    - In some cases the dependency container may need to be restarted as well.
    - In case of restarting the STRATO Vault wrapper container, make sure to re-enter the STRATO Vault password for the node.
  - If restarting the container (and its dependencies) does not help bring the container back up and healthy, please reach out to us via our support channels
3. Search the STRATO logs for possible problem causes
  - Most of the STRATO Docker containers have logs piped to Docker Logs
    - To check Docker logs of one container, use docker logs --tail=500 <container_name>.
    - To check the Docker logs for all STRATO containers at once, when located in directory with STRATO's docker-compose.yml (usually STRATO Getting Started directory), use docker-compose -p strato logs.
    - When experiencing problems with API or the HTTP access to the node, it is recommended to start with nginx container logs.
  - STRATO Core services logs can be found in log files located in strato_strato_1 container
    - To check the STRATO Core services logs: docker exec -it strato_strato_1 bash, then to list: ls logs/, then tail the log file of interest.
    - For network issues, it is recommended to start with logs/strato-p2p logs
    - For transaction-related issues, it is recommended to start with logs/strato-sequencer and logs/vm-runner logs (for mor information about log files, refer to STRATO Logs section)

STRATO Logs

Service	Container name	Sub-process	Logs location	Logs Description
Nginx	`strato_nginx_1`	-	container logs	incoming HTTP requests and access token verification
STRATO Core	`strato_strato_1`	`vm-runner`	in container, `logs/vm-runner`	STRATO VM logs (SolidVM/EVM)
		`strato-sequencer`	in container, `logs/strato-sequencer`	the logs of a sequencer which handles the ordering of blocks, transactions, and private chains coming from the API and P2P before sending them to the VM, as well as runs all PBFT logic
		`strato-p2p`	in container, `logs/strato-p2p`	STRATO P2P logs (communication between peers)
		`slipstream`	in container, `logs/slipstream`	the logs of Slipstream which is the indexer of blockchain data inserted into the Postgres 'cirrus' db
		`ethereum-discover`	in container, `logs/ethereum-discover`	P2P discovery logs (STRATO network peers auto-discovery)
		`logserver`	in container, `logs/logserver`	the logs of web-server providing HTTP access to node's logs when SERVE_LOGS=true (optional)
		`strato-adit`	in container, `logs/strato-adit`	the logs of proof-of-work miner that is just a pass-through for new blocks being sent from the VM to the sequencer when PBFT is enabled (default)
		`strato-api`	in container, `logs/strato-api`	STRATO API access logs (except `/cirrus/`, `/apex-api/` and `/health` endpoints)
		`strato-api-indexer`	in container, `logs/strato-api-indexer`	the logs of api-indexer which inserts blocks, transactions, and private chain info into the Postgres 'eth_' db
		`strato-p2p-indexer`	in container, `logs/strato-p2p-indexer`	the logs of p2p-indexer which inserts blocks, transactions, and private chain info into Redis
		`strato-setup`	in container, `logs/strato-setup`	STRATO Core's initialization script logs
		`strato-txr-indexer`	in container, `logs/strato-txr-indexer`	the logs of api-indexer which handles private chain governance events
Vault	`strato_vault-wrapper_1`	-	container logs	STRATO Vault logs (internal API requests from STRATO Core related to transaction signing using the user keys)
Apex	`strato_apex_1`	-	container logs	STRATO health calculation logs, `/apex-api/` access logs
PostgresT	`strato_postgrest_1`	-	container logs	PostgresT server logs representing the Cirrus API (`/cirrus/` endpoints) access logs

Common Problems

Common problems include:

Virtual machine, Docker service or the Vault container was restarted, STRATO alerts that the STRATO Password is unset, requires the Vault password to be re-entered by the administrator.
Solution: Re-enter Vault password with ./strato --set-password when in strato-getting-started/ directory.
TLS certificate is outdated, the HTTP calls to API or the STRATO Management Dashboard return the expired certificate warning.
Solution: Check the crontab job running the letsencrypt-tool's renew script. Execute it manually, find the problem, fix the command in crontab.
The STRATO Node peers cannot connect to each other (the node starts with 0 or 1 block and no peers in the SMD's main page.)
Solution: Check your private network configurations (e.g. Security Group settings in AWS) - ports 30303/TCP and 30303/UDP should be reachable between the nodes for inbound and outbound traffic.
Low Disk Space alert is triggered for the node.
Solution: Consider changing the data volume size for your VM or clean any unused data.
Low Memory alert is triggered for node.
Solution: Make sure your host machine meets the system requirements; check which processes use the most significant amount of RAM; consider restarting the STRATO Core container to see if this solves the problem; create the support ticket in BlockApps Support Service Desk to let us know about the issue causing the high memory usage.

Unresolved Problems

If you experience problems that cannot be resolved using the steps provided in this document, please consider creating a support ticket on BlockApps Support Service Desk so we can assist you promptly.