BACnet Network Diagnostics

BACnet Diagnostics explanations

Using WireShark

Use filter for non-standard BACnet network: udp port range 47808-47823

Detected Circular Networks

Overview

  • Circular networks happen when you have two or more routes to the same controller.
  • Look at the Hop Count value of every packet. Would need to be BACnet IP.
  • If the Hop Count drops below 10, it gets flagged as a Circular Network.
  • Will always also trigger the Low Hop Count check.
  • Could happen with BACnet MS/TP, but it is rarer.
  • Typically, the routes are BACnet/IP and BACnet/Ethernet and both are communicating on both networks.

HOP counts refer to the number of devices, usually routers, that a piece of data travels through. Each time that a packet of data moves from one router (or device) to another — say from the router of your home network to the one just outside your county line — that is considered one HOP.

Why it fails

  • Checked both boxes on the manufacturer’s controllers.
  • Migrated a site from Ethernet to IP.
  • Kept the Ethernet network because it is easy to connect to.

CircularNetworks

How to fix it

  • Circular Network will be causing such a traffic storm you won’t be able to reach any of the controllers.
  • Need to unplug one of the problematic controllers, isolate it from the network, and reconfigure it.
  • You can then add it back onto the network.

Token Disruptions

Overview

  • Indicates the number of Reply-To-Poll-For-Master frames sent in the capture. In a stable MS/TP network, there will be Poll-for-Masters every 50 token round-trips.
  • “Destination” is the device that sent the Poll-for-Master.
  • The highest numbered device (last device) or the last device before a break in numbering will Poll-for-Master for all numbers up to its max Poll-for-Master limit. If you don’t have a set limit, it defaults to 127.
  • If you don’t have any new devices, there will not be any responses. If you do have new devices, it will respond with “Reply-to-Poll-for-Master”.
  • If a device doesn’t accept the token twice in a row (device offline or token lost because of wiring), the sender will do a Poll-for-Master to identify the next device in line.
  • That device will be skipped for 50 round-trips, and then when the token comes to the break in numbers it will Poll-for-Master again. If the device is back online it will “Reply-to-Poll-for-Master”.
  • Fails if you have 1 Reply-to-Poll-for-Master in the capture.

Why it fails

  • Token loss due to wiring.
  • The device went offline.
  • The device is faulty.
  • A new device was added.

How to fix it

  • Look at the sources to identify the devices replying to the Poll-for-Master.
  • Is it a new device you just added? If so, it’s OK.
  • Check if it’s in the Unresponsive Devices list.
  • Check the wiring and see if there are any problems that may be causing the device to go on and offline.

Checksum Errors

Overview

  • Packet is malformed (it’s gibberish).
  • On MS/TP, it typically means the packet was clobbered by the network (poor wiring).
  • On IP, it can be bad wiring or electrical influence on the wire.
  • Not a BACnet router problem, it’s a physical router problem. The actual router itself is failing, or perhaps you have loose cable, or power fluctuations.

Why it fails

  • Bad wiring
  • Failing devices (overheating, old, internal electronic problems, etc.)
  • Power fluctuations

How to fix it

  • Use the Source and Destination devices to Isolate the path
  • Check if other destinations have received “normal” packets from the source
  • Look at other destinations with checksum errors to isolate the problem switch or wire
  • If it’s a switch, replace it
  • If it’s a wire, test and replace it

Duplicate Networks

Overview

  • More than one BACnet router routing traffic to the same network.
  • There can only be one router per segment, so each router needs a unique network number for each segment.
  • When network numbers get duplicated, you end up with two routers routing traffic to the same network.

Why it fails

  • Misconfigurations.
  • Merging sites.
  • There are multiple vendors on a site and no one to coordinate it all.
  • This is strictly a logical network problem, not a physical one.

How to fix it

  • Go into the network that’s currently online and change the network number.
  • Now it will communicate with the correct controllers, just like you want it to.
  • The other router will start communicating, and you can reset it too.
  • Make sure you always work with the controller that’s online. Otherwise you’ll have to fight to get controllers online and figure out which router is talking to which controllers.

Duplicate Device ID

Overview

  • More than one device on the same BACnet Network with the same BACnet Device Instance (aka Device ID).
  • Does not distinguish between different UDP ports, so if they are on different ports, it could be a false fail (have to drill down into frame info).
  • Still recommended to give them all unique IDs in case you need to change your applications or reconfigure the site in the future.

Why it fails

  • Bad address mapping and planning.
  • DIP switch on the controller.
  • Software addressing.
  • Merging networks (BBMD, etc.).
  • Adding controllers from other vendors.
  • Factory defaults.

How to fix it

  • Look at the Device ID that is duplicated, and at how many there are.
  • Drill down, and cross-reference the SNET and SADR with your device map to identify which devices are duplicated.
  • Ensure you have a clear Device ID naming convention and map.
  • Fix them: Give one a different address or put it offline so you can identify it, then start re-commissioning.

Duplicate BBMD Detection

Overview

  • A BACnet Broadcast Management Device (BBMD) sends a unicast message from one BACnet or IP device on a subnet to other subnets.
  • When the message gets to the destined subnet, the message is rebroadcast.
  • In a duplicate BBMD, multiple devices on the same network are set as BBMDs, and are rebroadcasting the same messages.

Why it fails

  • A setup issue, typically on a mixed vendor site.
  • The first vendor sets up multiple networks and connects them with BBMDs.
  • A site upgrade or change takes place, a new vendor wins the contract, and they add devices to each network.
  • To connect the devices, they put a BBMD on each subnet.
  • This can continue, resulting in two, three, or more BBMDs on each subnet, and double or triple the traffic. There should only be one BBMD per network.

How to fix it

  • Find all the problem vendors using the IP addresses, vendor identifier, and MAC address in Visual BACnet.
  • Isolate networks if you don’t need integration.
  • If not, assign one vendor and their BBMDs to the task.
  • A good rule of thumb is to use one vendor’s BBMD, as it’s easier to make site-wide changes.

Duplicate Source Address

Overview

  • Similar to Duplicate Device ID, but instead looks at the SNET and SADR.
  • This fails if more than one device sends an I-Am with the same Source Network and Source Address.
  • In Duplicate Device ID, the DIP switch settings between two devices are different, but the Device IDs are the same. In Duplicate Source Address, you have different Device IDs, but the DIP switch settings are the same.
  • If they are on the same network and have the same MAC, this will fail (these are used to derive the Source Address).
  • More likely to occur in MS/TP, unlikely in IP and Ethernet (possible, but very unlikely).

Why it fails

  • IP or Ethernet: incorrect factory programming.
  • Incorrect MS/TP segment setup.

How to fix it

  • On an MS/TP device, change the MAC Address
  • On an IP or Ethernet device, move it to a different network or replace the device

Unresponsive Routers

Overview

  • Similar to unresponsive devices, but the router that is responsible for a network doesn’t reply with I-Am router to network (with network specified).
  • Does not take into account Global Who-Is Router to Network (there is no network specified, so can’t tell if one is missing).

Why it fails

  • The network doesn’t exist.
  • The router’s offline (power or network connection).
  • The router’s busy (hammered with packets).
  • The network number was changed (but the source doesn’t know that).

How to fix it

  • Reset the source device.
  • Get the network number, which you can cross-reference with your site map to figure out which router is responsible for that network.
  • See if it’s online and stable.
  • Check to see if the network number changed.
  • Look at how much traffic is going through it (traffic where it’s the source and the destination).

Busy Routers

Overview

  • Devices and networks can be configured to send a Router-Busy-To-Network message based on certain thresholds.
  • When that router is busy, it will send a Router-Busy-To-Network message to any networks trying to talk to it.
  • When it is no longer busy, it will send a Router-Available-To-Network message, clearing the previously busy signal.
  • The diagnostic check fails if one or more routers send out a Router-Busy-To-Network message.

Why it fails

  • Too much broadcast or directed/unicast traffic.
  • The amount depends on the specific vendor, hardware capabilities, and configuration.
  • The router is servicing too many networks.
  • The router is servicing other things (firmware, etc.)

How to fix it

  • Look at how busy the network is, and which devices are sending the most traffic to it.
  • If those devices are sending traffic as they should, look at the router.
  • See if the router is chewing up its resources.
  • See what it’s trying to service (events or other networks).
  • Check if it’s trying to send data or trend log is too big.

Reject-to-Networks

Overview

  • Shows the number of networks with at least one Reject-Message-To-Network message in the capture.
  • Router responds with a Reject-Message-To-Network if it doesn’t like the packet. Fails if there is a Reject-Message-To-Network for one or more networks.

Why it fails

  • Device is badly configured.
  • Proprietary device is talking on the network.
  • Router cannot reach the destination network.
  • Non-standard BACnet information is being communicated.
  • The router firmware was updated, but changes haven’t taken effect.

How to fix it

  • Identify the reject reason.
  • Identify the source device that is sending the message.
  • Ensure the router in the source network and any routers in between the source and the destination network have the latest information.
  • Reset the device.

Standard Deviation of Token Round-Trip Time

Overview

  • This shows how much the round-trip time of a token is changing.
  • A larger number means more fluctuation in round-trip time.
  • Large Standard Deviation shows problems that are inconsistent, changing every trip.
  • Fail if: the fluctuation is more than 0.5ms.
  • Warning if: the fluctuation is more than 0.1ms.

Why it fails

  • Token disruptions, caused by:
  • Bad wiring.
  • Devices coming on and offline.
  • A device that is speaking for a long time inconsistently.

How to fix it

  • Check for token disruptions.
  • If token disruptions, look for offline devices or check wiring.
  • If no token disruptions, find the pair that have the longest standard deviation (sort Std. Dev. (ms) column).
  • Most commonly the destination is not accepting it.
  • Could also be that the source is chatty and holding the token too long.
  • Check to ensure the destination is online, configured and wired properly.

Average Token Round-Trip Time

Overview

  • Amount of time it takes for a token to complete one full trip.
  • Shows the path the token took.
  • Time is measured from when a source receives the token to when it receives it again.
  • Average of all the token round trips for every master in the capture.
  • Long average round-trip time shows problems that may be consistent every trip.
  • Fails if Average Token Round-Trip time is more than 2000ms (2 seconds).
  • Warning if Average Token Round-Trip time is more than 85ms (0.085 seconds).

Why it fails

  • Problems with wiring.
  • There may be a device offline.
  • Token disruptions (caused by wiring or devices that are offline).
  • Too many masters in the network using the token, holding onto token longer because they have more to say.
  • Device(s) that are trying to talk too much.

How to fix it

  • Check for token disruptions.
  • If there are token disruptions, look for offline devices or check wiring.
  • If no token disruptions, sort the drill down to find the device pair with the longest token passing time (sort using Mean (ms))
  • Check the configuration of the devices with the longest time with the token.
  • Check how busy the device is – it could be overwhelmed

Longest Response Time

Overview

  • Indicates the longest time between a request and a corresponding response in the entire capture.

Why it fails

  • The router is busy/overloaded
  • There are too many hops (too many routers in between the requesting device and the responding device)
  • The network is congested
  • The destination controller is busy

How to fix it

  • Identify the path between the requesting and responding device for each slow response time (using your network drawing)
  • Look at those routers in the BACnet browser and figure out what traffic is passing through them
  • Check the responding device, see if the device is busy.

Missing ACKs

Overview

  • A Confirmed-Request needs an acknowledgement, or an Abort, Reject, or Error. If you don’t get any of those, it’s a Missing ACK.

Why it fails

  • The destination device is offline.
  • The router that it goes through is offline or busy

How to fix it

  • Check if the destination device is unresponsive.
  • If it’s unresponsive, then that’s why you have a missing ACK. Otherwise, check what’s in between.
  • Check unresponsive devices, to see if they’re going online and offline.

Global Who-Is Router

Overview

  • A router looking for a router to one or more networks.
  • It sends out a “Who-Is Router to Network” without specifying the network number, and all routers in the network respond with I-Am Router.
  • If the router specifies the network, it’s not included in this check.
  • Who-Is Router and I-Am Router are both broadcast, so on a large network if they all respond at the same time, it causes a lot of traffic.
  • Fails if we see one or more sources sending out more than one Global Who-Is Router.

Why it fails

  • The router or controller was offline, came back online, and is trying to figure out if its information is up to date.
  • The router is configured to periodically ask for this.

How to fix it

  • Identify source device sending the Global Who-Is Router.
  • Check to see if router is stable and staying online.

Global Who-Is

Overview

  • Devices use Who-Is service (broadcast) to discover other BACnet devices on the network.
  • Usually it’s targeted at a specific device, but in some cases they broadcast a Global Who-Is to the whole network, and they will all respond with I-Ams (broadcast).
  • Global Who-Is should never be sent automatically, or should be used very sparingly so as not to flood the network.
  • Fails if a single source sends more than one Global Who-Is during a capture.
  • Any segmented Who-Is is included as a Global Who-Is.
  • This is a way to break up the requests into many smaller requests that look for ranges with the entire BACnet device ID range.

Why it fails

  • Misconfiguration of the BMS (constantly or regularly discovering the network).
    • This could be a setting or incorrect default.
  • Poor implementation of the discover process.
    • Good intentions can cause this problem. Keeping an accurate BACnet device count is crucial to a network, especially with security, but the network can pay for that.
  • Manual global discovery during the capture.
    • This is a great way to get as much BACnet device information as possible for troubleshooting. Be careful, though, as it can have drastic effects on the network. As the network’s size increases, this becomes more problematic.
  • General
    • Many systems will do this check to make sure that the BACnet device list is up to date. There are other ways to do this, such as a targeted Who-Is, but every system will need to find the controllers on a network.

How to fix it

  • Identify the source device causing the problems.
    • Determine which device and how often it is sending it.
    • Is there a regular frequency? Is it random?
    • Is it the same source every time or multiple sources? It could be different servers or vendors.
  • Check the configuration on the source device(s) sending the Global Who-Is.
    • Is it normal? Can it be turned off or reduced?
  • If manually triggered, note the time and the source so you can easily track it throughout the system.

Global Private-Transfers

Overview

  • Check if there’s a global broadcast of a proprietary service.
  • Warn: 10 sources that send out a global broadcast of proprietary service.
  • Fail: 300 sources that send out a global broadcast of proprietary service.

Why it fails

  • There are too many devices sending out global broadcast of proprietary service.

How to fix it

  • Need to know what they’re sending.
  • Get a list of the sources.
  • Remove unnecessary sources.

BACnet Broadcast Traffic

Overview

  • Looks at the overall average rate of Broadcast Traffic over the length of the capture.
  • Divides the total number of broadcast packets by the length of the capture.
  • Fails if: there is an average of more than 10 pps.
  • Warning if: the average is 1 – 9.99 pps.

Why it fails

  • Left defaults.
  • Didn’t properly configure devices based on site scaling up.
  • Common things that can broadcast:
  • Time
  • Events
  • Etc.

How to fix it

  • Identify the device sending Broadcast Traffic.
  • Manually direct the traffic to the appropriate destination devices, rather than broadcasting it.

Unresponsive Devices

Overview

  • If a Who-Is is sent to a device, and no I-Am is received, that device is flagged as unresponsive.

Why it fails

  • Bad programming links.
  • Bad graphics links (phantom devices).
  • Incorrect Data Exchange.
  • No power (unplugged, failed, or intermittent power supply).
  • No network (not physically plugged in, wire broken, intermittent).
  • Loss of communication.
  • Too many devices on the same transformer.
  • Router above it is offline or overloaded (*Check for Unresponsive Routers first).

How to fix it

  • Check to ensure there are no Unresponsive Routers first. If there are, fix them. (This is very common)
  • Identify the devices that are unresponsive.
  • Check if they are physically there (look at a list or check in person).
  • If they are there and are supposed to be there, put them back online.
  • If they aren’t supposed to be there, trace the source in Visual BACnet, and fix the device that keeps asking for it.
  • If it keeps going offline, track how much traffic is being sent to the destination to see if it’s being overloaded (use the graphs).

Error Responses

Overview

  • This comes from Confirmed-Request.
  • When there are problems with the request or with the device that is servicing the request, the response can be either Error, Abort, or Reject.
  • For each of those, there will be a list of reasons associated with it.

Why it fails

  • Depending on the response (Error, Abort, or Reject), there will be a list of reasons provided.

How to fix it

  • Look to the list of reasons provided and proceed accordingly.

Alarm

Overview

  • The initiator (source) of the request is the device that asks another device (destination) for a summary of active alarms.
  • This is primarily done from the server.
  • The alarms belong to the device being queried.
  • In Visual BACnet under Alarms, the destinations are the devices with alarm states (which are found in the second table).
  • Warning if more than 1 GetAlarmSummary is sent during capture
  • Fail if more than 8 GetAlarmSummaries are sent during capture

Why it fails

  • Software misconfiguration
  • Manually triggered GetAlarmSummary request
  • Multiple front ends

How to fix it

  • Check configuration to see how often it is asking for the GetAlarmSummary
  • Set confirmed notification when alarm gets sent
  • Reduce the amount of traffic on the network
  • Add more routing

Write-Property Traffic

Overview

  • Writing data to another controller. Writing to an object to do an action.
  • Checks for excessive writes to any object.
  • For security reasons, you shouldn’t have a lot of writes.

Why it fails

  • Misconfigured programming – devices doing Write-Properties unnecessarily.
  • Incorrect data exchange.
  • Bad device selection (e.g. device with output doesn’t have programming space).
  • A configurable device has an output on it.
  • Cybersecurity attack.

How to fix it

  • Identify who is sending the Write-Property traffic.
  • See if you can change the write to a read.
  • If not, reduce the amount of Write-Property traffic.

Read-Property Traffic

Overview

  • Check for Read-Property service.
  • Commonly used for polling (every 30 seconds controller goes to get data from other devices, instead of them sending the data to the controller) and COV.
  • Some devices do not support COV, so they would use Read-Property to get the latest value.

Why it fails

  • The check fails because there’s too much Read-Property Traffic.

How to fix it

  • If it’s because of polling, you can reduce the rate at which you poll.
  • Do you really need this data?
  • Consider COV instead of Read-Property.

Confirmed-COV Traffic

Overview

  • There’s too much threshold in the Change of Value (COV).
  • For example, when the lights turn on, there’s a response back saying the lights turned on. Or, the temperature fluctuates by a certain amount, and a notification gets sent.
  • Have to send back that you received the Change of Value.
  • It’s a configuration issue that can create a lot of traffic.

Why it fails

  • Configuration issue, where the value has increments that are too small. E.g. Change of Value should be 2° instead of 0.001°.
  • Leftover commissioning trend logs.
  • Getting Confirmed-COVs for irrelevant changes. E.g. a confirmation on temperature, or lighting. Usually you want a Confirmed COV with outputs, not inputs.

How to fix it

  • Find the device that’s sending out too many COVs, and reconfigure the device.
  • Change the increments.
  • Evaluate if Confirmed-COV is needed.

Unconfirmed-COV Traffic

Overview

  • There’s too much threshold in the Change of Value (COV).
  • For example, when the lights turn on, there’s a response back saying the lights turned on. Or, the temperature fluctuates by a certain amount, and a notification gets sent.
  • No confirmation of receipt (different from Confirmed-COV).
  • It’s a configuration issue that can create a lot of traffic.

Why it fails

  • Configuration issue, where the value has increments that are too small. E.g. Change of Value should be 2° instead of 0.001°.
  • Leftover commissioning trend logs.
  • Getting Unconfirmed-COVs for irrelevant changes. E.g. a confirmation on temperature, or lighting. Usually you want an Unconfirmed COV with outputs, not inputs.

How to fix it

  • Find the device that’s sending out too many COVs, and reconfigure the device.
  • Change the increments.
  • Evaluate if Unconfirmed-COV is needed.

Confirmed-Event Traffic

Overview

  • There’s too much threshold in the Events.
  • For example, if temperature triggers an event threshold, Confirmed-Event is triggered and sent.
  • Have to send back that you received the Event.
  • It’s a configuration issue that can create a lot of traffic.

Why it fails

  • Too many Confirmed-Event triggers, or too often. Both will trigger a check failure.

How to fix it

  • It’s a configuration issue.
  • Change the event trigger and time.
  • See if that event trigger is really needed, or if it can be turned off.
  • Could also change the threshold for the trigger.

Unconfirmed-Event Traffic

Overview

  • There’s too much threshold in the Events.
  • For example, if temperature triggers an event threshold, Confirmed-Event is triggered and sent.
  • No confirmation of receipt required.
  • It’s a configuration issue that can create a lot of traffic.

Why it fails

  • Too many Confirmed-Event triggers, or too often. Both will trigger a check failure.

How to fix it

  • It’s a configuration issue.
  • Change the event trigger and time.
  • See if that event trigger is really needed, or if it can be turned off.
  • Could also change the threshold for the trigger.

BACnet Buffer Full Broadcasts

Overview

  • Trend logs only.
  • The device that is writing to the log is saying that the log is ready to be read, and it hasn’t been read.
  • Any traffic going into the log either doesn’t go in, or it overrides what was already there. This leads to gaps in data, either at the end of your data, or the front of your data.
  • Traffic problems are causing BACnet Buffer Full problems, because the Buffer cannot be emptied. It cannot be emptied because it has not been read.

Why it fails

  • The log is not being read.
  • The archiver is not online, or somehow cannot reach the controller.
  • The controller’s limits. Every embedded device has X amount of memory, so you can only put so much data on it.
  • There might be traffic issues.
  • It could be a configuration issue, if trend log source is too small. Maybe you’re collecting only five, instead of 2,000 entries. That would cause this notification to go off all the time.
  • There should be a cleanup service. If there isn’t, the Buffer will continually fill up.

How to fix it

  • Identify which device it was. Look to the source.
  • Check the archiver.
  • Check your configurations: how much data it’s storing, and how fast it’s filling up (in Visual BACnet, pps = how often it’s been sending out the notification on average).
  • Check the router. If the router’s full or busy, it might not be able to read the log.
Next