mlcomm.algorithms
Adaptive Communication Decision and Information Systems (ACDIS) - User License https://bloch.ece.gatech.edu/researchgroup/
Copyright (c) 2024-2025 Georgia Institute of Technology
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. Users shall cite ACDIS publications regarding this work.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER INANACTION OF CONTRACT TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
- class mlcomm.algorithms.ABT(params)
Adaptive Beam Tracking (ABT) from [1], uses initial alignment from HPM algorithm in [2]
[1] Ronquillo, Nancy, and Tara Javidi. “Active beam tracking under stochastic mobility.” ICC 2021-IEEE International Conference on Communications. IEEE, 2021. [2] Chiu, Sung-En, Nancy Ronquillo, and Tara Javidi. “Active learning and CSI acquisition for mmWave initial alignment.” IEEE Journal on Selected Areas in Communications 37.11 (2019): 2474-2489.
Notes
- run_alg(time_horizon)
Executes the HPM algorithm for the simulation and logs various data points.
This method runs the core HPM algorithm, capturing key data points at various stages of the process. The logged data is stored in a dictionary and returned at the end of the method execution.
Parameters
- time_horizonint
Number of timesteps to track the moving entity
Returns
- dict
A dictionary containing logged data from the simulation run. The keys in the dictionary represent different data points, and the values represent the recorded data for those points.
Raises
- SimulationError
If the simulation encounters an error during execution.
Example
log_data = run_alg() print(log_data[“relative_spectral_efficiency”])
- class mlcomm.algorithms.AlgorithmTemplate(params)
Description
AlgorithmTemplate is a class to represent the simulation of an algorithm that interacts with a communication channel and an associated codebook graph.
Attributes
- cb_graphobject
The codebook graph associated with the simulation.
- channelobject
The communication channel used in the simulation.
- best_midxint
midx corresponding to the node with the highest mean_reward
- log_datdict
Algorithm-specific dictionary for storing simulation data.
Methods
- sample(self, node, with_noise=True):
Samples the node’s response with optional noise.
- set_best(self)
sets attribute best_midx, the midx with the largest mean reward
- calculate_relative_spectral_efficiency(self,node)
Calculates the relative spectral efficiency with respect to the node with the highest mean rewards
- calculate_relative_spectral_efficiency(node)
Description
Calculates relative spectral efficiency with respect to node specified and node with highest mean reward, attribute best_node
Parameters
- nodeobject
The node to be used in the relative spectral efficiency calculation.
Returns
- float
The relative spectral efficiency.
- sample(node, transmit_power_dbm=0, with_noise=True, mode='rss')
Description
Samples the node’s response with optional noise.
This method computes the absolute squared value of the conjugate transpose of the node’s field vector multiplied by the channel’s array response. Noise can be optionally included in the computation.
Parameters
- nodeobject
The node to be sampled.
- transmit_power_dbmfloat
Transmit power over the channel in dbw, not required for BasicChannel
- with_noisebool, optional
A flag to indicate whether noise should be included in the sample (default is True).
- modestr
Valid choices are ‘rss’ and ‘complex’, default to ‘rss’. Dictates reward returned, some Bayesian algorithms require complex value.
Returns
- float
The absolute squared value of the sampled response or complex value within.
- class mlcomm.algorithms.DBZ(params)
Dynamic Beam Zooming (DBZ) takes a hierarchical approach to intial alignment and beam tracking for an integrated sensing and communication approach. DBZ uses a base algorithm, Lower-Upper Confidence Bound (LUCB) [1,2] for best arm identification in fixed confidence. May be run in the traditional sense, where all sample over time are considered for the mean, or a sample window length is specified where only recent samples are considered. The latter is used for non-stationary reward structures. Different from [1,2], we are only interested in sets of arms with cardinality 1.
Our confidence term and exploration rate differ due to the reward probability distribution.
[1] Kalyanakrishnan, Shivaram, et al. “PAC subset selection in stochastic multi-armed bandits.” ICML. Vol. 12. 2012. [2] Gabillon, Victor, Mohammad Ghavamzadeh, and Alessandro Lazaric. “Best arm identification: A unified approach to fixed budget and fixed confidence.” Advances in Neural Information Processing Systems 25 (2012).
Parameters
- paramsdict
dict with params
Attributes
- cb_graphobject
The codebook graph associated with the simulation.
- deltafloat
confidence term
- epsilonfloat
tolerance term that is scaled based on the level h
- modestr
Specifies ‘stationary’ or ‘non-stationary’ setting, either or is a valid entry
- pnparray
array of 1s and 0s indicating which levels to play
- Wnparray of ints
array where each element indicates the sample window length associated with the particular level
- afloat
Threshold parameter
- bfloat
First confidence parameter
- cfloat
Secondary confidence parameter
Notes
Example
bandit = DBZ({‘cb_graph’ : cb_graph, ‘channel’ : channel, ‘delta’ : .1, ‘epsilon’, .001, ‘mode’ : stationary, levels_to_play’ : [1,1,1,1], ‘a’ : 1, ‘b’ : .1, ‘c’ : .1})
- get_gamma_u(nn, current_midxs)
Description
Determines the midx corresponding to gamma and u from the LUCB algorithm. gamma and u correspond to the largest LCB and the largest UCB that is not the same midx as gamma.
Parameters
- nnint
time step of the algorithm
- current_midxslist or array
midxs corresponding to cb_graph object attribute in which the bandit game will play those nodes
Notes
- initialize(nn, current_midxs)
Description
Samples all nodes within current_midxs twice, and determines initial gamma and u from the LUCB algorithm. gamma and u correspond to the largest LCB and the largest UCB that is not the same midx as gamma.
Parameters
- current_midxslist or array
midxs corresponding to cb_graph object attribute in which the bandit game will play those nodes
- perform_sample_update_channel(nn, node_to_sample)
Description
Wrapper function to perform operations necessary during sampling.
Notes
Requires 2* self.cb_graph.M +1 flops for the inner product and absolute value squared.
- run_alg(time_horizon)
- sampling_iteration(nn, gamma_midx, u_midx)
Description
Subordinate algorithm for sampling in LUCB algorithms. If there is only one arm being considered for sampling, which will happen when the active beam is one of the narrowest beamforming patterns, sample it. Otherwise, proceed through the selection process outlined in [1,2]
Parameters
- nnint
time step of the algorithm
- gamma_midxint
corresponds to the largest LCB
- u_midxint
corresponds to the largest UCB that is not gamma_midx, determined with method ‘get_gamma_u’
Notes
Requires 2*(2*nodes[u_midx].W + 11) + 1 to compute the confidence term and 2*self.cb_graph.M +1 from
perform_sample_update_channel
- update_node(node, r)
Description
Updates the node’s sample history, and deletes “old” samples as specified by the sampling window length “W”
In the non-stationary setting, this “ages out” older samples of other nodes
Parameters
- nodeobject
The node to be updated.
- rfloat
The new sample reward.
Returns
None
Notes
Simulated buffered memory, not consuming flops
- class mlcomm.algorithms.EKF(params)
Description
Uses the Extended Kalman Filter (EKF) from [1] to select beamforming vectors based on the nearest pointing angle.
[1] V. Va, H. Vikalo and R. W. Heath, “Beam tracking for mobile millimeter wave communication systems,” 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Washington, DC, USA, 2016, pp. 743-747, doi: 10.1109/GlobalSIP.2016.7905941.
Attributes
- Mint
number of antenna elements
- rhofloat
fading coefficient for channel model
- comm_nodeobject
Node from codebook object currently being used to communicate
Methods
- delh(x, real=True, hermitian=False)
Linearization for observation model estimate
- ekf_recursion()
Standard Extended Kalman Filter recursion steps.
- h(x, real=True)
Observation model estimate
- log_and_update_channel(nn)
Description
Wrapper function to perform operations for updating the logs and channel fluctuations
- run_alg(time_horizon)
- update_beam_steering(verbose=False)
If the angle a half beamwidth away in either direction, choose the closest beamforming vector.
- class mlcomm.algorithms.HBA(params)
Description
Implements Hierarchical Beam Alignment (HBA) from [1]
[1] Wu, Wen, et al. “Fast mmwave beam alignment via correlated bandit learning.” IEEE Transactions on Wireless Communications 18.12 (2019): 5894-5908.
Attributes
Methods
Notes
- get_idx(h, j)
- run_alg(time_horizon)
- class mlcomm.algorithms.HOSUB(params)
Description
Hierarchical Optimal Sampling for Unimodal Bandits (HOSUB) [1] is a class that extends AlgorithmTemplate to implement a hierarchical optimization algorithm for selecting the best beamforming vector. HOSUB is find the best arm in a user-designated fixed bugdet of time steps, where one sample is taken each time step.
[1] Blinn, Nathan, Jana Boerger, and Matthieu Bloch. “mmWave Beam Steering with Hierarchical Optimal Sampling for Unimodal Bandits.” ICC 2021-IEEE International Conference on Communications. IEEE, 2021.
Attributes
- cb_graphobject
The codebook graph associated with the simulation.
- Nint
Time horizon for the algorithm.
- h0int
Starting level in the hierarchical structure.
- cfloat
Exploration-exploitation trade-off parameter.
- deltafloat
Confidence parameter for upper confidence bound calculations.
Methods
- __init__(self, params):
Initializes the HOSUB algorithm with the provided parameters.
- run_alg(self):
Runs the HOSUB algorithm to find the best beamforming vector.
- update_node(self, node, r):
Updates the node’s empirical mean reward and upper confidence bound.
- run_alg()
Description
Runs the HOSUB algorithm to find the best beamforming vector.
This method iteratively selects and samples nodes in the codebook graph, updating their empirical mean rewards and upper confidence bounds, until the time horizon is reached or the narrowest beamforming vector is found.
Returns
- int
The midx corresponding to the estimated best beamforming vector as indexed by the codebook graph.
- update_node(node, r)
Description
Updates the node’s empirical mean reward and upper confidence bound.
This method updates the number of pulls, empirical mean reward, and upper confidence bound for the specified node based on the new sample reward.
Parameters
- nodeobject
The node to be updated.
- rfloat
The new sample reward.
Returns
None
- class mlcomm.algorithms.HPM(params)
Description
Implements the Hierarchical Posterior Matching (HPM) [1] algorithm. Given our fixed codebook resolution, we only use the fixed resolution (FR) impelmentation in their algorithm.
- [1] Chiu, Sung-En, Nancy Ronquillo, and Tara Javidi. “Active learning and CSI acquisition for mmWave initial alignment.”
IEEE Journal on Selected Areas in Communications 37.11 (2019): 2474-2489.
Attributes
- deltafloat
The target confidence for the algorithm.
- fading_estimationstr
Fading estimation mode, valid choices are ‘exact’, ‘estimate’, and ‘unitary’
- modestr
Fixed length (FL) or variable length (VL) where the algorithm stops after a certain number of timesteps or after achieving confidence 1-delta
Notes
Works only with binary tree codebook.
- run_alg()
Executes the HPM algorithm for the simulation and logs various data points.
This method runs the core HPM algorithm, capturing key data points at various stages of the process. The logged data is stored in a dictionary and returned at the end of the method execution.
Returns
- dict
A dictionary containing logged data from the simulation run. The keys in the dictionary represent different data points, and the values represent the recorded data for those points.
Raises
- SimulationError
If the simulation encounters an error during execution.
- class mlcomm.algorithms.MotionTS(params)
Description
Phased Track and Stop Beamforming approach from where only recent samples are considered. In the case that an empirical mean chosen is less than any previous level’s, zoom out.
Parameters
- paramsdict
A dictionary of parameters for initializing the NPHTS algorithm. Must include ‘levels_to_play’, ‘delta’, and ‘epsilon’.
- levels_to_play: List[int]
A list indicating the levels to be played in the algorithm.
- delta: List[float]
A list of delta values for each level.
- epsilon: float
The epsilon value for the algorithm.
Raises
- AssertionError
If the length of ‘levels_to_play’ does not match the length of ‘delta’. If the last element of ‘levels_to_play’ is not 1.
Methods
- __init__(self, params):
Initializes the NPHTS algorithm with the provided parameters.
- run_alg(self):
Runs the NPHTS algorithm to find the best beamforming vector.
Notes
Relies on the TASD class for the update_node method.
Slightly modified from the implementation in [3] which has a fixed two-phase approach, whereas here we allow the user to select which levels to play by the key in params ‘levels_to_play’.
- run_alg(time_horizon)
Description
Runs the algorithm to find the best beamforming vector.
This method implements the core logic of the NPHTS algorithm, leveraging the parameters initialized in the __init__ method.
Parameters
- time_horizonint
Number of time steps to run the simulation
Returns
- int
The midx corresponding to the estimated best beamforming vector as indexed by the codebook graph.
- int
The number of samples prior to terminating
- update_node(node, r, current_midxs)
Description
Updates the node’s sample history, and deletes “old” samples as specified by the sampling window length “W”
In the non-stationary setting, this “ages out” older samples of other nodes
Parameters
- nodeobject
The node to be updated.
- rfloat
The new sample reward.
Returns
None
- class mlcomm.algorithms.NPHTS(params)
Description
Phased Track and Stop Beamforming approach from [1]. This class implements the NPHTS algorithm, an extension of the TASD class, where TASD is run for each level specified by attribute p (‘levels_to_play’) in the hierarchical beamforming codebook.
[1] Wei, Yi, Zixin Zhong, and Vincent YF Tan. “Fast beam alignment via pure exploration in multi-armed bandits.” IEEE Transactions on Wireless Communications (2022).
Original code at: https://github.com/YiWei0129/Fast-beam-alignment
Parameters
- paramsdict
A dictionary of parameters for initializing the NPHTS algorithm. Must include ‘levels_to_play’, ‘delta’, and ‘epsilon’.
- levels_to_play: List[int]
A list indicating the levels to be played in the algorithm.
- delta: List[float]
A list of delta values for each level.
- epsilon: float
The epsilon value for the algorithm.
Raises
- AssertionError
If the length of ‘levels_to_play’ does not match the length of ‘delta’. If the last element of ‘levels_to_play’ is not 1.
Methods
- __init__(self, params):
Initializes the NPHTS algorithm with the provided parameters.
- run_alg(self):
Runs the NPHTS algorithm to find the best beamforming vector.
Notes
Relies on the TASD class for the update_node method.
Slightly modified from the implementation in [3] which has a fixed two-phase approach, whereas here we allow the user to select which levels to play by the key in params ‘levels_to_play’.
- run_alg()
Description
Runs the NPHTS algorithm to find the best beamforming vector.
This method implements the core logic of the NPHTS algorithm, leveraging the parameters initialized in the __init__ method.
Returns
- int
The midx corresponding to the estimated best beamforming vector as indexed by the codebook graph.
- int
The number of samples prior to terminating
- class mlcomm.algorithms.OffsetMAB(params)
Description
Implements the Algorithm 1 OR ALgorithm 2 from [1] using the confidence terms from the DBZ paper in stationary and non-stationary environments. Only uses the narrowest beamforming vectors, but the “arms” are actually the offset from the arm being used for communication. In updating the rewards, the node assigned the sample history was rolled from another. This mechanic accounts for the motion, as described in [1].
There are some adaptation differences with aligning the time steps with the use of a beamforming vector. The algorithm in [1] executes a beam sweep of several beamforming vectors at a single time step, which does not match my time scale.
[1] Zhang, Jianjun, et al. “Beam alignment and tracking for millimeter wave communications via bandit learning.” IEEE Transactions on Communications 68.9 (2020): 5519-5533.
Parameters
- paramsdict
dict with params
Attributes
- cb_graphobject
The codebook graph associated with the simulation.
- epsilonfloat
‘epsilon-greed’ policy parameter
- alphafloat
discount paramter for non-stationary rewards
- policystr
Specifies ‘UCB’ or ‘epsilon-greedy’ policy to use for beamforming vector selection
- modestr
Specifies ‘stationary’ or ‘non-stationary’ setting, either or is a valid entry
- cfloat
Secondary confidence parameter
Notes
‘epsilon’ is unused in policy ‘UCB’ and ‘c’ is unused in policy ‘epsilon-greedy’
Example
bandit = OffsetMAB({‘cb_graph’ : cb_graph, ‘channel’ : channel, ‘epsilon’ : .1, ‘alpha’ : 1e-4,’mode’ : ‘non-stationary’,’policy’ : ‘UCB’, ‘c’ : 1})
- perform_sample_update_channel(nn, node_to_sample)
Description
Wrapper function to perform operations necessary during sampling.
Notes
Requires 2* self.cb_graph.M +1 flops for the inner product and absolute value squared.
- run_alg(time_horizon)
- update_current_midxs(est_best_midx)
Description
The ‘current_midxs’ variables is intended to track the midxs associated with the subset codebook, denoted as calF_{u,b}, in [1].
Parameters
- est_best_midxint
midx corresponding to the beamforming vector with the current high average rewards
Returns
- current_midxslist of ints
list of midxs based on the indexing of (u,b) in [1]
- class mlcomm.algorithms.PF(params)
Description
Uses the Particle Filter (EKF) from [1] to select beamforming vectors based on the nearest pointing angle, and brodens the beam based on the number of antenna elements required.
We obtain our initial state estimate by using an exhaustive search with the most narrow beams.
Unlike the original implementation in [1], which turns elements on and off to narrow or broaden the beam, we choose beamforming vectors from the ternary beamforming codebook based on the number of elements in (15) in [1], that possesses the similar beamwidth. For example, consider a ternary hiearchical codebook with beamwidths at each level: [24, 8, 2.67, 0.89] (in degrees), at time step n, the method in [1] determines 128 elments, which corresponds to roughly to the .89 degree beamwidth in our array of beamwidths. After observation at time step n+1, the algorithm determines 32 elements, which 2/32 is approximately a 3.58 degree beamwidth. Because it is wider than 2.67, we broaden the beam to the level with 8 degrees with the beamforming vector whose pattern points in the direction of the current angle estimate and broaden it.
[1] H. Chung, J. Kang, H. Kim, Y. M. Park and S. Kim, “Adaptive Beamwidth Control for mmWave Beam Tracking,” in IEEE Communications Letters, vol. 25, no. 1, pp. 137-141, Jan. 2021, doi: 10.1109/LCOMM.2020.3022877.
Attributes
- Sint
number of particles
- comm_nodeobject
Node from codebook object currently being used to communicate
Methods
- class Particle(index, initial_weight, initial_state)
- get_estimate()
- get_predictions()
- log_and_update_channel(nn)
Description
Wrapper function to perform operations for updating the logs and channel fluctuations
- resample()
Resampling routine from https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python/blob/master/12-Particle-Filters.ipynb
- run_alg(time_horizon)
- update_weights(z)
- zeta_k()
- class mlcomm.algorithms.TASD(params)
Description
Track and Stop D (TASD) [1] algorithm performed on midxs specified from the cb_graph object passed on initialization (see documentaiton for AlgorithmTemplate)
[1] Garivier, Aurélien, and Emilie Kaufmann. “Nonasymptotic sequential tests for overlapping hypotheses applied to near-optimal arm identification in bandit models.” Sequential Analysis 40.1 (2021): 61-96.
Original Julia code at https://github.com/EmilieKaufmann/BAI_epsBAI_code/blob/master/EpsilonBAIalgos.jl
Attributes
- deltafloat
Confidence parameter for upper confidence bound calculations.
- epsilonfloat
Tolerance paramter in which rewards are epsilon-optimal if mu + epsilon>= mu_max
Methods
- __init__(self, params):
Initializes the HOSUB algorithm with the provided parameters.
- run_base_alg(self):
Runs the TASD algorithm to find the best beamforming vector.
- update_node(self, node, r):
Updates the node’s empirical mean reward and upper confidence bound.
- run_base_alg(current_midxs=None, time_horizon=100000, update_logs=True, verbose=False)
Description
Runs the Track and Stop algorithm from [1] with Chernoff stopping + D-Tracking
Parameters
- current_midxsarray or list of ints
master indices from codebook object corresponding to nodes which are played in contention in a MAB game. If left unspecified, then by default all of the master indices belonging to nodes with the narrowest beams are used.
Subordinate Functions
- rate(t, delta):
Computes the rate function.
- d(mu1, mu2):
Computes the divergence between two empirical values.
- lambdaX(x, mua, mub, epsilon, pre=1e-12):
Computes the minimizer for lambda in (mu^- ; mu^+ - epsilon).
- gb(x, mua, mub, epsilon, pre=1e-12):
Computes the minimum value of d(mua, lambda) + d(mub, lambda + epsilon).
- AdmissibleAux(mu, a, epsilon):
Computes the admissible auxiliary values for a given arm.
- xbofy(y, mua, mub, epsilon, pre=1e-12):
Finds x such that g_b(x) = y.
- dicoSolve(f, xMin, xMax, pre=1e-11):
Finds m such that f(m) = 0 using binary search.
- auxEps(y, mu, a, epsilon, pre=1e-12):
Returns F_mu(y) - 1.
- aOpt(mu, a, epsilon, pre=1e-12):
Returns the optimal weights and values for the epsilon optimal arm.
- OptimalWeightsEpsilon(mu, epsilon, pre=1e-11):
Returns T*(mu) and a matrix containing the candidate optimal weights.
- PGLRT(muhat, counts, epsilon, Aeps, K):
Computes the parallel GLRT stopping rule and returns the best arm.
Returns
- int
The midx corresponding to the estimated best beamforming vector as indexed by the codebook graph.
- int
The number of samples prior to terminating
- update_node(node, r, current_midxs)
Description
Updates the node’s empirical mean reward and upper confidence bound.
This method updates the number of pulls, empirical mean reward, and upper confidence bound for the specified node based on the new sample reward.
Parameters
- nodeobject
The node to be updated.
- rfloat
The new sample reward.
Returns
None