_images/atl2077_gold_cell.png

mlcomm.algorithms

Adaptive Communication Decision and Information Systems (ACDIS) - User License https://bloch.ece.gatech.edu/researchgroup/

Copyright (c) 2024-2025 Georgia Institute of Technology

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. Users shall cite ACDIS publications regarding this work.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER INANACTION OF CONTRACT TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

class mlcomm.algorithms.ABT(params)

Adaptive Beam Tracking (ABT) from [1], uses initial alignment from HPM algorithm in [2]

[1] Ronquillo, Nancy, and Tara Javidi. “Active beam tracking under stochastic mobility.” ICC 2021-IEEE International Conference on Communications. IEEE, 2021. [2] Chiu, Sung-En, Nancy Ronquillo, and Tara Javidi. “Active learning and CSI acquisition for mmWave initial alignment.” IEEE Journal on Selected Areas in Communications 37.11 (2019): 2474-2489.

Notes

run_alg(time_horizon)

Executes the HPM algorithm for the simulation and logs various data points.

This method runs the core HPM algorithm, capturing key data points at various stages of the process. The logged data is stored in a dictionary and returned at the end of the method execution.

Parameters

time_horizonint

Number of timesteps to track the moving entity

Returns

dict

A dictionary containing logged data from the simulation run. The keys in the dictionary represent different data points, and the values represent the recorded data for those points.

Raises

SimulationError

If the simulation encounters an error during execution.

Example

log_data = run_alg() print(log_data[“relative_spectral_efficiency”])

class mlcomm.algorithms.AlgorithmTemplate(params)

Description

AlgorithmTemplate is a class to represent the simulation of an algorithm that interacts with a communication channel and an associated codebook graph.

Attributes

cb_graphobject

The codebook graph associated with the simulation.

channelobject

The communication channel used in the simulation.

best_midxint

midx corresponding to the node with the highest mean_reward

log_datdict

Algorithm-specific dictionary for storing simulation data.

Methods

sample(self, node, with_noise=True):

Samples the node’s response with optional noise.

set_best(self)

sets attribute best_midx, the midx with the largest mean reward

calculate_relative_spectral_efficiency(self,node)

Calculates the relative spectral efficiency with respect to the node with the highest mean rewards

calculate_relative_spectral_efficiency(node)

Description

Calculates relative spectral efficiency with respect to node specified and node with highest mean reward, attribute best_node

Parameters

nodeobject

The node to be used in the relative spectral efficiency calculation.

Returns

float

The relative spectral efficiency.

sample(node, transmit_power_dbm=0, with_noise=True, mode='rss')

Description

Samples the node’s response with optional noise.

This method computes the absolute squared value of the conjugate transpose of the node’s field vector multiplied by the channel’s array response. Noise can be optionally included in the computation.

Parameters

nodeobject

The node to be sampled.

transmit_power_dbmfloat

Transmit power over the channel in dbw, not required for BasicChannel

with_noisebool, optional

A flag to indicate whether noise should be included in the sample (default is True).

modestr

Valid choices are ‘rss’ and ‘complex’, default to ‘rss’. Dictates reward returned, some Bayesian algorithms require complex value.

Returns

float

The absolute squared value of the sampled response or complex value within.

set_best()

Description

Sets the attribute best_midx, which is the midx belonging to the node with the highest mean reward.

class mlcomm.algorithms.DBZ(params)

Dynamic Beam Zooming (DBZ) takes a hierarchical approach to intial alignment and beam tracking for an integrated sensing and communication approach. DBZ uses a base algorithm, Lower-Upper Confidence Bound (LUCB) [1,2] for best arm identification in fixed confidence. May be run in the traditional sense, where all sample over time are considered for the mean, or a sample window length is specified where only recent samples are considered. The latter is used for non-stationary reward structures. Different from [1,2], we are only interested in sets of arms with cardinality 1.

Our confidence term and exploration rate differ due to the reward probability distribution.

[1] Kalyanakrishnan, Shivaram, et al. “PAC subset selection in stochastic multi-armed bandits.” ICML. Vol. 12. 2012. [2] Gabillon, Victor, Mohammad Ghavamzadeh, and Alessandro Lazaric. “Best arm identification: A unified approach to fixed budget and fixed confidence.” Advances in Neural Information Processing Systems 25 (2012).

Parameters

paramsdict

dict with params

Attributes

cb_graphobject

The codebook graph associated with the simulation.

deltafloat

confidence term

epsilonfloat

tolerance term that is scaled based on the level h

modestr

Specifies ‘stationary’ or ‘non-stationary’ setting, either or is a valid entry

pnparray

array of 1s and 0s indicating which levels to play

Wnparray of ints

array where each element indicates the sample window length associated with the particular level

afloat

Threshold parameter

bfloat

First confidence parameter

cfloat

Secondary confidence parameter

Notes

Example

bandit = DBZ({‘cb_graph’ : cb_graph, ‘channel’ : channel, ‘delta’ : .1, ‘epsilon’, .001, ‘mode’ : stationary, levels_to_play’ : [1,1,1,1], ‘a’ : 1, ‘b’ : .1, ‘c’ : .1})

get_gamma_u(nn, current_midxs)

Description

Determines the midx corresponding to gamma and u from the LUCB algorithm. gamma and u correspond to the largest LCB and the largest UCB that is not the same midx as gamma.

Parameters

nnint

time step of the algorithm

current_midxslist or array

midxs corresponding to cb_graph object attribute in which the bandit game will play those nodes

Notes

initialize(nn, current_midxs)

Description

Samples all nodes within current_midxs twice, and determines initial gamma and u from the LUCB algorithm. gamma and u correspond to the largest LCB and the largest UCB that is not the same midx as gamma.

Parameters

current_midxslist or array

midxs corresponding to cb_graph object attribute in which the bandit game will play those nodes

perform_sample_update_channel(nn, node_to_sample)

Description

Wrapper function to perform operations necessary during sampling.

Notes

Requires 2* self.cb_graph.M +1 flops for the inner product and absolute value squared.

run_alg(time_horizon)
sampling_iteration(nn, gamma_midx, u_midx)

Description

Subordinate algorithm for sampling in LUCB algorithms. If there is only one arm being considered for sampling, which will happen when the active beam is one of the narrowest beamforming patterns, sample it. Otherwise, proceed through the selection process outlined in [1,2]

Parameters

nnint

time step of the algorithm

gamma_midxint

corresponds to the largest LCB

u_midxint

corresponds to the largest UCB that is not gamma_midx, determined with method ‘get_gamma_u’

Notes

Requires 2*(2*nodes[u_midx].W + 11) + 1 to compute the confidence term and 2*self.cb_graph.M +1 from perform_sample_update_channel

update_node(node, r)

Description

Updates the node’s sample history, and deletes “old” samples as specified by the sampling window length “W”

In the non-stationary setting, this “ages out” older samples of other nodes

Parameters

nodeobject

The node to be updated.

rfloat

The new sample reward.

Returns

None

Notes

Simulated buffered memory, not consuming flops

class mlcomm.algorithms.EKF(params)

Description

Uses the Extended Kalman Filter (EKF) from [1] to select beamforming vectors based on the nearest pointing angle.

[1] V. Va, H. Vikalo and R. W. Heath, “Beam tracking for mobile millimeter wave communication systems,” 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Washington, DC, USA, 2016, pp. 743-747, doi: 10.1109/GlobalSIP.2016.7905941.

Attributes

Mint

number of antenna elements

rhofloat

fading coefficient for channel model

comm_nodeobject

Node from codebook object currently being used to communicate

Methods

delh(x, real=True, hermitian=False)

Linearization for observation model estimate

ekf_recursion()

Standard Extended Kalman Filter recursion steps.

h(x, real=True)

Observation model estimate

log_and_update_channel(nn)

Description

Wrapper function to perform operations for updating the logs and channel fluctuations

run_alg(time_horizon)
update_beam_steering(verbose=False)

If the angle a half beamwidth away in either direction, choose the closest beamforming vector.

class mlcomm.algorithms.HBA(params)

Description

Implements Hierarchical Beam Alignment (HBA) from [1]

[1] Wu, Wen, et al. “Fast mmwave beam alignment via correlated bandit learning.” IEEE Transactions on Wireless Communications 18.12 (2019): 5894-5908.

Attributes

Methods

Notes

get_idx(h, j)
run_alg(time_horizon)
class mlcomm.algorithms.HOSUB(params)

Description

Hierarchical Optimal Sampling for Unimodal Bandits (HOSUB) [1] is a class that extends AlgorithmTemplate to implement a hierarchical optimization algorithm for selecting the best beamforming vector. HOSUB is find the best arm in a user-designated fixed bugdet of time steps, where one sample is taken each time step.

[1] Blinn, Nathan, Jana Boerger, and Matthieu Bloch. “mmWave Beam Steering with Hierarchical Optimal Sampling for Unimodal Bandits.” ICC 2021-IEEE International Conference on Communications. IEEE, 2021.

Attributes

cb_graphobject

The codebook graph associated with the simulation.

Nint

Time horizon for the algorithm.

h0int

Starting level in the hierarchical structure.

cfloat

Exploration-exploitation trade-off parameter.

deltafloat

Confidence parameter for upper confidence bound calculations.

Methods

__init__(self, params):

Initializes the HOSUB algorithm with the provided parameters.

run_alg(self):

Runs the HOSUB algorithm to find the best beamforming vector.

update_node(self, node, r):

Updates the node’s empirical mean reward and upper confidence bound.

run_alg()

Description

Runs the HOSUB algorithm to find the best beamforming vector.

This method iteratively selects and samples nodes in the codebook graph, updating their empirical mean rewards and upper confidence bounds, until the time horizon is reached or the narrowest beamforming vector is found.

Returns

int

The midx corresponding to the estimated best beamforming vector as indexed by the codebook graph.

update_node(node, r)

Description

Updates the node’s empirical mean reward and upper confidence bound.

This method updates the number of pulls, empirical mean reward, and upper confidence bound for the specified node based on the new sample reward.

Parameters

nodeobject

The node to be updated.

rfloat

The new sample reward.

Returns

None

class mlcomm.algorithms.HPM(params)

Description

Implements the Hierarchical Posterior Matching (HPM) [1] algorithm. Given our fixed codebook resolution, we only use the fixed resolution (FR) impelmentation in their algorithm.

[1] Chiu, Sung-En, Nancy Ronquillo, and Tara Javidi. “Active learning and CSI acquisition for mmWave initial alignment.”

IEEE Journal on Selected Areas in Communications 37.11 (2019): 2474-2489.

Attributes

deltafloat

The target confidence for the algorithm.

fading_estimationstr

Fading estimation mode, valid choices are ‘exact’, ‘estimate’, and ‘unitary’

modestr

Fixed length (FL) or variable length (VL) where the algorithm stops after a certain number of timesteps or after achieving confidence 1-delta

Notes

Works only with binary tree codebook.

run_alg()

Executes the HPM algorithm for the simulation and logs various data points.

This method runs the core HPM algorithm, capturing key data points at various stages of the process. The logged data is stored in a dictionary and returned at the end of the method execution.

Returns

dict

A dictionary containing logged data from the simulation run. The keys in the dictionary represent different data points, and the values represent the recorded data for those points.

Raises

SimulationError

If the simulation encounters an error during execution.

class mlcomm.algorithms.MotionTS(params)

Description

Phased Track and Stop Beamforming approach from where only recent samples are considered. In the case that an empirical mean chosen is less than any previous level’s, zoom out.

Parameters

paramsdict

A dictionary of parameters for initializing the NPHTS algorithm. Must include ‘levels_to_play’, ‘delta’, and ‘epsilon’.

  • levels_to_play: List[int]

    A list indicating the levels to be played in the algorithm.

  • delta: List[float]

    A list of delta values for each level.

  • epsilon: float

    The epsilon value for the algorithm.

Raises

AssertionError

If the length of ‘levels_to_play’ does not match the length of ‘delta’. If the last element of ‘levels_to_play’ is not 1.

Methods

__init__(self, params):

Initializes the NPHTS algorithm with the provided parameters.

run_alg(self):

Runs the NPHTS algorithm to find the best beamforming vector.

Notes

Relies on the TASD class for the update_node method.

Slightly modified from the implementation in [3] which has a fixed two-phase approach, whereas here we allow the user to select which levels to play by the key in params ‘levels_to_play’.

run_alg(time_horizon)

Description

Runs the algorithm to find the best beamforming vector.

This method implements the core logic of the NPHTS algorithm, leveraging the parameters initialized in the __init__ method.

Parameters

time_horizonint

Number of time steps to run the simulation

Returns

int

The midx corresponding to the estimated best beamforming vector as indexed by the codebook graph.

int

The number of samples prior to terminating

update_node(node, r, current_midxs)

Description

Updates the node’s sample history, and deletes “old” samples as specified by the sampling window length “W”

In the non-stationary setting, this “ages out” older samples of other nodes

Parameters

nodeobject

The node to be updated.

rfloat

The new sample reward.

Returns

None

class mlcomm.algorithms.NPHTS(params)

Description

Phased Track and Stop Beamforming approach from [1]. This class implements the NPHTS algorithm, an extension of the TASD class, where TASD is run for each level specified by attribute p (‘levels_to_play’) in the hierarchical beamforming codebook.

[1] Wei, Yi, Zixin Zhong, and Vincent YF Tan. “Fast beam alignment via pure exploration in multi-armed bandits.” IEEE Transactions on Wireless Communications (2022).

Original code at: https://github.com/YiWei0129/Fast-beam-alignment

Parameters

paramsdict

A dictionary of parameters for initializing the NPHTS algorithm. Must include ‘levels_to_play’, ‘delta’, and ‘epsilon’.

  • levels_to_play: List[int]

    A list indicating the levels to be played in the algorithm.

  • delta: List[float]

    A list of delta values for each level.

  • epsilon: float

    The epsilon value for the algorithm.

Raises

AssertionError

If the length of ‘levels_to_play’ does not match the length of ‘delta’. If the last element of ‘levels_to_play’ is not 1.

Methods

__init__(self, params):

Initializes the NPHTS algorithm with the provided parameters.

run_alg(self):

Runs the NPHTS algorithm to find the best beamforming vector.

Notes

Relies on the TASD class for the update_node method.

Slightly modified from the implementation in [3] which has a fixed two-phase approach, whereas here we allow the user to select which levels to play by the key in params ‘levels_to_play’.

run_alg()

Description

Runs the NPHTS algorithm to find the best beamforming vector.

This method implements the core logic of the NPHTS algorithm, leveraging the parameters initialized in the __init__ method.

Returns

int

The midx corresponding to the estimated best beamforming vector as indexed by the codebook graph.

int

The number of samples prior to terminating

class mlcomm.algorithms.OffsetMAB(params)

Description

Implements the Algorithm 1 OR ALgorithm 2 from [1] using the confidence terms from the DBZ paper in stationary and non-stationary environments. Only uses the narrowest beamforming vectors, but the “arms” are actually the offset from the arm being used for communication. In updating the rewards, the node assigned the sample history was rolled from another. This mechanic accounts for the motion, as described in [1].

There are some adaptation differences with aligning the time steps with the use of a beamforming vector. The algorithm in [1] executes a beam sweep of several beamforming vectors at a single time step, which does not match my time scale.

[1] Zhang, Jianjun, et al. “Beam alignment and tracking for millimeter wave communications via bandit learning.” IEEE Transactions on Communications 68.9 (2020): 5519-5533.

Parameters

paramsdict

dict with params

Attributes

cb_graphobject

The codebook graph associated with the simulation.

epsilonfloat

‘epsilon-greed’ policy parameter

alphafloat

discount paramter for non-stationary rewards

policystr

Specifies ‘UCB’ or ‘epsilon-greedy’ policy to use for beamforming vector selection

modestr

Specifies ‘stationary’ or ‘non-stationary’ setting, either or is a valid entry

cfloat

Secondary confidence parameter

Notes

‘epsilon’ is unused in policy ‘UCB’ and ‘c’ is unused in policy ‘epsilon-greedy’

Example

bandit = OffsetMAB({‘cb_graph’ : cb_graph, ‘channel’ : channel, ‘epsilon’ : .1, ‘alpha’ : 1e-4,’mode’ : ‘non-stationary’,’policy’ : ‘UCB’, ‘c’ : 1})

perform_sample_update_channel(nn, node_to_sample)

Description

Wrapper function to perform operations necessary during sampling.

Notes

Requires 2* self.cb_graph.M +1 flops for the inner product and absolute value squared.

run_alg(time_horizon)
update_current_midxs(est_best_midx)

Description

The ‘current_midxs’ variables is intended to track the midxs associated with the subset codebook, denoted as calF_{u,b}, in [1].

Parameters

est_best_midxint

midx corresponding to the beamforming vector with the current high average rewards

Returns

current_midxslist of ints

list of midxs based on the indexing of (u,b) in [1]

class mlcomm.algorithms.PF(params)

Description

Uses the Particle Filter (EKF) from [1] to select beamforming vectors based on the nearest pointing angle, and brodens the beam based on the number of antenna elements required.

We obtain our initial state estimate by using an exhaustive search with the most narrow beams.

Unlike the original implementation in [1], which turns elements on and off to narrow or broaden the beam, we choose beamforming vectors from the ternary beamforming codebook based on the number of elements in (15) in [1], that possesses the similar beamwidth. For example, consider a ternary hiearchical codebook with beamwidths at each level: [24, 8, 2.67, 0.89] (in degrees), at time step n, the method in [1] determines 128 elments, which corresponds to roughly to the .89 degree beamwidth in our array of beamwidths. After observation at time step n+1, the algorithm determines 32 elements, which 2/32 is approximately a 3.58 degree beamwidth. Because it is wider than 2.67, we broaden the beam to the level with 8 degrees with the beamforming vector whose pattern points in the direction of the current angle estimate and broaden it.

[1] H. Chung, J. Kang, H. Kim, Y. M. Park and S. Kim, “Adaptive Beamwidth Control for mmWave Beam Tracking,” in IEEE Communications Letters, vol. 25, no. 1, pp. 137-141, Jan. 2021, doi: 10.1109/LCOMM.2020.3022877.

Attributes

Sint

number of particles

comm_nodeobject

Node from codebook object currently being used to communicate

Methods

class Particle(index, initial_weight, initial_state)
get_estimate()
get_predictions()
log_and_update_channel(nn)

Description

Wrapper function to perform operations for updating the logs and channel fluctuations

resample()

Resampling routine from https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python/blob/master/12-Particle-Filters.ipynb

run_alg(time_horizon)
update_weights(z)
zeta_k()
class mlcomm.algorithms.TASD(params)

Description

Track and Stop D (TASD) [1] algorithm performed on midxs specified from the cb_graph object passed on initialization (see documentaiton for AlgorithmTemplate)

[1] Garivier, Aurélien, and Emilie Kaufmann. “Nonasymptotic sequential tests for overlapping hypotheses applied to near-optimal arm identification in bandit models.” Sequential Analysis 40.1 (2021): 61-96.

Original Julia code at https://github.com/EmilieKaufmann/BAI_epsBAI_code/blob/master/EpsilonBAIalgos.jl

Attributes

deltafloat

Confidence parameter for upper confidence bound calculations.

epsilonfloat

Tolerance paramter in which rewards are epsilon-optimal if mu + epsilon>= mu_max

Methods

__init__(self, params):

Initializes the HOSUB algorithm with the provided parameters.

run_base_alg(self):

Runs the TASD algorithm to find the best beamforming vector.

update_node(self, node, r):

Updates the node’s empirical mean reward and upper confidence bound.

run_base_alg(current_midxs=None, time_horizon=100000, update_logs=True, verbose=False)

Description

Runs the Track and Stop algorithm from [1] with Chernoff stopping + D-Tracking

Parameters

current_midxsarray or list of ints

master indices from codebook object corresponding to nodes which are played in contention in a MAB game. If left unspecified, then by default all of the master indices belonging to nodes with the narrowest beams are used.

Subordinate Functions

rate(t, delta):

Computes the rate function.

d(mu1, mu2):

Computes the divergence between two empirical values.

lambdaX(x, mua, mub, epsilon, pre=1e-12):

Computes the minimizer for lambda in (mu^- ; mu^+ - epsilon).

gb(x, mua, mub, epsilon, pre=1e-12):

Computes the minimum value of d(mua, lambda) + d(mub, lambda + epsilon).

AdmissibleAux(mu, a, epsilon):

Computes the admissible auxiliary values for a given arm.

xbofy(y, mua, mub, epsilon, pre=1e-12):

Finds x such that g_b(x) = y.

dicoSolve(f, xMin, xMax, pre=1e-11):

Finds m such that f(m) = 0 using binary search.

auxEps(y, mu, a, epsilon, pre=1e-12):

Returns F_mu(y) - 1.

aOpt(mu, a, epsilon, pre=1e-12):

Returns the optimal weights and values for the epsilon optimal arm.

OptimalWeightsEpsilon(mu, epsilon, pre=1e-11):

Returns T*(mu) and a matrix containing the candidate optimal weights.

PGLRT(muhat, counts, epsilon, Aeps, K):

Computes the parallel GLRT stopping rule and returns the best arm.

Returns

int

The midx corresponding to the estimated best beamforming vector as indexed by the codebook graph.

int

The number of samples prior to terminating

update_node(node, r, current_midxs)

Description

Updates the node’s empirical mean reward and upper confidence bound.

This method updates the number of pulls, empirical mean reward, and upper confidence bound for the specified node based on the new sample reward.

Parameters

nodeobject

The node to be updated.

rfloat

The new sample reward.

Returns

None