Maintenance and Debugging
Note
This information can be used by STRATO node administrators.
Node Health and Maintenance
Monitoring STRATO health on the node & network levels helps discover issues proactively, based on the warning alerts.
Automated STRATO Health Monitoring
BlockApps recommends an automated STRATO network health monitoring. Consider using a 3rd party uptime monitoring software (like NodePing, CloudRadar Monitoring, Scalyr, AWS CloudWatch etc.) to periodically poll the health endpoint of Central Monitoring Dashboard's API, and setting up the alerts.
Health Checks
STRATO Health checks are
- Node-level health checks include:
- RAM usage (warning alert triggered on usage of 80% or more)
- Disk usage (warning alert triggered on disk space usage of 80% or more)
- STRATO Vault in-memory password is set (warning alert triggered if unset)
- Any STRATO Docker containers are unhealthy or down
- Other
- Network-level health checks include:
- All monitored nodes are healthy
- None of the monitored nodes have active warning alerts
- All monitored nodes are in sync (latest block, sequence and round numbers match) (minor difference in numbers is admissible.)
- The network does not have any queued transactions that can not be processed ("stalled network".)
Based on the warning alerts, a STRATO Node's administrator has a time window to prevent the potential problems caused by such issues like low disk space or low system memory.
Permanently monitoring the network-level health (CMD's GET /health
endpoint) gives an administrator a peace of mind and an overall status of STRATO Network.
Debugging
The issues with a STRATO Node and Network should be investigated and debugged following the steps below
Identify Problematic Nodes
These steps should be used to identify which nodes are causing problems on the network level.
- Check CMD's health public endpoint:
GET https://<cmd_address>/health
- Check for
needsAttention
value being true. Proceed to next step if true, otherwise in case if you experience issues, proceed to Node Debugging.
- Check for
- Check CMD's health protected endpoint:
GET https://<cmd_address>/server/health
(requires authentication - sign-in at https:/// to obtain the session cookie in your browser first.) - Check for warning messages (
warnings
array). The warning messages describe the issue and show which nodes have the problem. - Check for any unhealthy nodes listed (
unHealthyNodes
array). This provides the information on which of the monitored nodes are unhealthy. Proceed to Node Debugging. - Check for
isOverallStalling
value beingtrue
. This usually indicates the problem with consensus in the STRATO network.- Make sure the (½)n + 1 validator nodes are up and running healthy.
- Make sure the nodes can communicate with each other on ports 30303/TCP and 30303/UDP (check your private network configurations)
- Proceed to Node Debugging to look for the P2P issues in the logs
- Check for warning messages (
Node Debugging
These steps should be used to identify the problems on the node level (if not identified on the network level.)
- Check STRATO's health protected endpoint:
GET https://<strato_address>/apex-api/health
(requires authentication - sign-in at https:/// to obtain the session cookie in your browser first.) - Check the Cloud Monitoring status of you VM (e.g. AWS CloudWatch and Instance Status Checks on AWS)
- SSH to the machine and try the following steps:
- Check system health
- Check CPU and RAM usage (with
htop
ortop
) - Check disk usage (with
df -h
)
- Check CPU and RAM usage (with
- Check the Docker container health statuses (
docker ps
and look for health statuses in parentheses)- In case if there is one or multiple of the containers unhealthy, check their logs (e.g. with
docker logs --tail=100 <container_name>
). This should output the message indicating the problem with the container's health. - In most cases restarting the container (with
docker restart <container_name>
) does the trick. Doing it is safe in terms of data persistence.- In some cases the dependency container may need to be restarted as well.
- In case of restarting the STRATO Vault wrapper container, make sure to re-enter the STRATO Vault password for the node.
- If restarting the container (and its dependencies) does not help bring the container back up and healthy, please reach out to us via our support channels
- In case if there is one or multiple of the containers unhealthy, check their logs (e.g. with
- Search the STRATO logs for possible problem causes
- Most of the STRATO Docker containers have logs piped to Docker Logs
- To check Docker logs of one container, use
docker logs --tail=500 <container_name>
. - To check the Docker logs for all STRATO containers at once, when located in directory with STRATO's
docker-compose.yml
(usually STRATO Getting Started directory), usedocker-compose -p strato logs
. - When experiencing problems with API or the HTTP access to the node, it is recommended to start with nginx container logs.
- To check Docker logs of one container, use
- STRATO Core services logs can be found in log files located in
strato_strato_1
container- To check the STRATO Core services logs:
docker exec -it strato_strato_1 bash
, then to list:ls logs/
, then tail the log file of interest. - For network issues, it is recommended to start with
logs/strato-p2p
logs - For transaction-related issues, it is recommended to start with
logs/strato-sequencer
andlogs/vm-runner
logs (for mor information about log files, refer to STRATO Logs section)
- To check the STRATO Core services logs:
- Most of the STRATO Docker containers have logs piped to Docker Logs
- Check system health
STRATO Logs
Service | Container name | Sub-process | Logs location | Logs Description |
---|---|---|---|---|
Nginx | strato_nginx_1 |
- | container logs | incoming HTTP requests and access token verification |
STRATO Core | strato_strato_1 |
vm-runner |
in container, logs/vm-runner |
STRATO VM logs (SolidVM/EVM) |
strato-sequencer |
in container, logs/strato-sequencer |
the logs of a sequencer which handles the ordering of blocks, transactions, and private chains coming from the API and P2P before sending them to the VM, as well as runs all PBFT logic | ||
strato-p2p |
in container, logs/strato-p2p |
STRATO P2P logs (communication between peers) | ||
slipstream |
in container, logs/slipstream |
the logs of Slipstream which is the indexer of blockchain data inserted into the Postgres 'cirrus' db | ||
ethereum-discover |
in container, logs/ethereum-discover |
P2P discovery logs (STRATO network peers auto-discovery) | ||
logserver |
in container, logs/logserver |
the logs of web-server providing HTTP access to node's logs when SERVE_LOGS=true (optional) | ||
strato-adit |
in container, logs/strato-adit |
the logs of proof-of-work miner that is just a pass-through for new blocks being sent from the VM to the sequencer when PBFT is enabled (default) | ||
strato-api |
in container, logs/strato-api |
STRATO API access logs (except /cirrus/ , /apex-api/ and /health endpoints) |
||
strato-api-indexer |
in container, logs/strato-api-indexer |
the logs of api-indexer which inserts blocks, transactions, and private chain info into the Postgres 'eth_' db | ||
strato-p2p-indexer |
in container, logs/strato-p2p-indexer |
the logs of p2p-indexer which inserts blocks, transactions, and private chain info into Redis | ||
strato-setup |
in container, logs/strato-setup |
STRATO Core's initialization script logs | ||
strato-txr-indexer |
in container, logs/strato-txr-indexer |
the logs of api-indexer which handles private chain governance events | ||
Vault | strato_vault-wrapper_1 |
- | container logs | STRATO Vault logs (internal API requests from STRATO Core related to transaction signing using the user keys) |
Apex | strato_apex_1 |
- | container logs | STRATO health calculation logs, /apex-api/ access logs |
PostgresT | strato_postgrest_1 |
- | container logs | PostgresT server logs representing the Cirrus API (/cirrus/ endpoints) access logs |
Common Problems
Common problems include:
- Virtual machine, Docker service or the Vault container was restarted, STRATO alerts that the STRATO Password is unset, requires the Vault password to be re-entered by the administrator.
Solution: Re-enter Vault password with./strato --set-password
when instrato-getting-started/
directory. - TLS certificate is outdated, the HTTP calls to API or the STRATO Management Dashboard return the expired certificate warning.
Solution: Check the crontab job running the letsencrypt-tool's renew script. Execute it manually, find the problem, fix the command in crontab. - The STRATO Node peers cannot connect to each other (the node starts with 0 or 1 block and no peers in the SMD's main page.)
Solution: Check your private network configurations (e.g. Security Group settings in AWS) - ports 30303/TCP and 30303/UDP should be reachable between the nodes for inbound and outbound traffic. - Low Disk Space alert is triggered for the node.
Solution: Consider changing the data volume size for your VM or clean any unused data. - Low Memory alert is triggered for node.
Solution: Make sure your host machine meets the system requirements; check which processes use the most significant amount of RAM; consider restarting the STRATO Core container to see if this solves the problem; create the support ticket in BlockApps Support Service Desk to let us know about the issue causing the high memory usage.
Unresolved Problems
If you experience problems that cannot be resolved using the steps provided in this document, please consider creating a support ticket on BlockApps Support Service Desk so we can assist you promptly.