mlcomm.algorithms

Adaptive Communication Decision and Information Systems (ACDIS) - User License https://bloch.ece.gatech.edu/researchgroup/

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. Users shall cite ACDIS publications regarding this work.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER INANACTION OF CONTRACT TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

class mlcomm.algorithms.ABT(params)

Adaptive Beam Tracking (ABT) from [1], uses initial alignment from HPM algorithm in [2]

[1] Ronquillo, Nancy, and Tara Javidi. “Active beam tracking under stochastic mobility.” ICC 2021-IEEE International Conference on Communications. IEEE, 2021. [2] Chiu, Sung-En, Nancy Ronquillo, and Tara Javidi. “Active learning and CSI acquisition for mmWave initial alignment.” IEEE Journal on Selected Areas in Communications 37.11 (2019): 2474-2489.

Notes

run_alg(time_horizon)

Executes the HPM algorithm for the simulation and logs various data points.

This method runs the core HPM algorithm, capturing key data points at various stages of the process. The logged data is stored in a dictionary and returned at the end of the method execution.

Parameters

time_horizonint: Number of timesteps to track the moving entity

Returns

dict: A dictionary containing logged data from the simulation run. The keys in the dictionary represent different data points, and the values represent the recorded data for those points.

Raises

SimulationError: If the simulation encounters an error during execution.

Example

log_data = run_alg() print(log_data[“relative_spectral_efficiency”])

class mlcomm.algorithms.AlgorithmTemplate(params)

Description

AlgorithmTemplate is a class to represent the simulation of an algorithm that interacts with a communication channel and an associated codebook graph.

Attributes

cb_graphobject: The codebook graph associated with the simulation.
channelobject: The communication channel used in the simulation.
best_midxint: midx corresponding to the node with the highest mean_reward
log_datdict: Algorithm-specific dictionary for storing simulation data.

Methods

sample(self, node, with_noise=True):: Samples the node’s response with optional noise.
set_best(self): sets attribute best_midx, the midx with the largest mean reward
calculate_relative_spectral_efficiency(self,node): Calculates the relative spectral efficiency with respect to the node with the highest mean rewards

calculate_relative_spectral_efficiency(node)

Description

Calculates relative spectral efficiency with respect to node specified and node with highest mean reward, attribute best_node

Parameters

nodeobject: The node to be used in the relative spectral efficiency calculation.

Returns

float: The relative spectral efficiency.

sample(node, transmit_power_dbm=0, with_noise=True, mode='rss')

Description

Samples the node’s response with optional noise.

This method computes the absolute squared value of the conjugate transpose of the node’s field vector multiplied by the channel’s array response. Noise can be optionally included in the computation.

Parameters

nodeobject: The node to be sampled.
transmit_power_dbmfloat: Transmit power over the channel in dbw, not required for BasicChannel
with_noisebool, optional: A flag to indicate whether noise should be included in the sample (default is True).
modestr: Valid choices are ‘rss’ and ‘complex’, default to ‘rss’. Dictates reward returned, some Bayesian algorithms require complex value.

Returns

float: The absolute squared value of the sampled response or complex value within.

set_best(): Description

Sets the attribute best_midx, which is the midx belonging to the node with the highest mean reward.

class mlcomm.algorithms.DBZ(params)

Dynamic Beam Zooming (DBZ) takes a hierarchical approach to intial alignment and beam tracking for an integrated sensing and communication approach. DBZ uses a base algorithm, Lower-Upper Confidence Bound (LUCB) [1,2] for best arm identification in fixed confidence. May be run in the traditional sense, where all sample over time are considered for the mean, or a sample window length is specified where only recent samples are considered. The latter is used for non-stationary reward structures. Different from [1,2], we are only interested in sets of arms with cardinality 1.

Our confidence term and exploration rate differ due to the reward probability distribution.

[1] Kalyanakrishnan, Shivaram, et al. “PAC subset selection in stochastic multi-armed bandits.” ICML. Vol. 12. 2012. [2] Gabillon, Victor, Mohammad Ghavamzadeh, and Alessandro Lazaric. “Best arm identification: A unified approach to fixed budget and fixed confidence.” Advances in Neural Information Processing Systems 25 (2012).

Parameters

paramsdict: dict with params

Attributes

cb_graphobject: The codebook graph associated with the simulation.
deltafloat: confidence term
epsilonfloat: tolerance term that is scaled based on the level h
modestr: Specifies ‘stationary’ or ‘non-stationary’ setting, either or is a valid entry
pnparray: array of 1s and 0s indicating which levels to play
Wnparray of ints: array where each element indicates the sample window length associated with the particular level
afloat: Threshold parameter
bfloat: First confidence parameter
cfloat: Secondary confidence parameter

Notes

Example

bandit = DBZ({‘cb_graph’ : cb_graph, ‘channel’ : channel, ‘delta’ : .1, ‘epsilon’, .001, ‘mode’ : stationary, levels_to_play’ : [1,1,1,1], ‘a’ : 1, ‘b’ : .1, ‘c’ : .1})

get_gamma_u(nn, current_midxs)

Description

Determines the midx corresponding to gamma and u from the LUCB algorithm. gamma and u correspond to the largest LCB and the largest UCB that is not the same midx as gamma.

Parameters

nnint: time step of the algorithm
current_midxslist or array: midxs corresponding to cb_graph object attribute in which the bandit game will play those nodes

Notes

initialize(nn, current_midxs)

Description

Samples all nodes within current_midxs twice, and determines initial gamma and u from the LUCB algorithm. gamma and u correspond to the largest LCB and the largest UCB that is not the same midx as gamma.

Parameters

current_midxslist or array: midxs corresponding to cb_graph object attribute in which the bandit game will play those nodes

perform_sample_update_channel(nn, node_to_sample): Description

Wrapper function to perform operations necessary during sampling.

Notes

Requires 2* self.cb_graph.M +1 flops for the inner product and absolute value squared.

run_alg(time_horizon)

sampling_iteration(nn, gamma_midx, u_midx)

Description

Subordinate algorithm for sampling in LUCB algorithms. If there is only one arm being considered for sampling, which will happen when the active beam is one of the narrowest beamforming patterns, sample it. Otherwise, proceed through the selection process outlined in [1,2]

Parameters

nnint: time step of the algorithm
gamma_midxint: corresponds to the largest LCB
u_midxint: corresponds to the largest UCB that is not gamma_midx, determined with method ‘get_gamma_u’

Notes

Requires 2*(2*nodes[u_midx].W + 11) + 1 to compute the confidence term and 2*self.cb_graph.M +1 from perform_sample_update_channel

update_node(node, r)

Description

Updates the node’s sample history, and deletes “old” samples as specified by the sampling window length “W”

In the non-stationary setting, this “ages out” older samples of other nodes

Parameters

nodeobject: The node to be updated.
rfloat: The new sample reward.

Returns

None

Notes

Simulated buffered memory, not consuming flops

class mlcomm.algorithms.EKF(params)

Description

Uses the Extended Kalman Filter (EKF) from [1] to select beamforming vectors based on the nearest pointing angle.

[1] V. Va, H. Vikalo and R. W. Heath, “Beam tracking for mobile millimeter wave communication systems,” 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Washington, DC, USA, 2016, pp. 743-747, doi: 10.1109/GlobalSIP.2016.7905941.

Attributes

Mint: number of antenna elements
rhofloat: fading coefficient for channel model
comm_nodeobject: Node from codebook object currently being used to communicate

Methods

delh(x, real=True, hermitian=False): Linearization for observation model estimate

ekf_recursion(): Standard Extended Kalman Filter recursion steps.

h(x, real=True): Observation model estimate

log_and_update_channel(nn): Description

Wrapper function to perform operations for updating the logs and channel fluctuations

run_alg(time_horizon)

update_beam_steering(verbose=False): If the angle a half beamwidth away in either direction, choose the closest beamforming vector.

class mlcomm.algorithms.HBA(params)

Description

Implements Hierarchical Beam Alignment (HBA) from [1]

[1] Wu, Wen, et al. “Fast mmwave beam alignment via correlated bandit learning.” IEEE Transactions on Wireless Communications 18.12 (2019): 5894-5908.

Attributes

Methods

Notes

get_idx(h, j)

run_alg(time_horizon)

class mlcomm.algorithms.HOSUB(params)

Description

Hierarchical Optimal Sampling for Unimodal Bandits (HOSUB) [1] is a class that extends AlgorithmTemplate to implement a hierarchical optimization algorithm for selecting the best beamforming vector. HOSUB is find the best arm in a user-designated fixed bugdet of time steps, where one sample is taken each time step.

[1] Blinn, Nathan, Jana Boerger, and Matthieu Bloch. “mmWave Beam Steering with Hierarchical Optimal Sampling for Unimodal Bandits.” ICC 2021-IEEE International Conference on Communications. IEEE, 2021.

Attributes

cb_graphobject: The codebook graph associated with the simulation.
Nint: Time horizon for the algorithm.
h0int: Starting level in the hierarchical structure.
cfloat: Exploration-exploitation trade-off parameter.
deltafloat: Confidence parameter for upper confidence bound calculations.

Methods

__init__(self, params):: Initializes the HOSUB algorithm with the provided parameters.
run_alg(self):: Runs the HOSUB algorithm to find the best beamforming vector.
update_node(self, node, r):: Updates the node’s empirical mean reward and upper confidence bound.

run_alg()

Description

Runs the HOSUB algorithm to find the best beamforming vector.

This method iteratively selects and samples nodes in the codebook graph, updating their empirical mean rewards and upper confidence bounds, until the time horizon is reached or the narrowest beamforming vector is found.

Returns

int: The midx corresponding to the estimated best beamforming vector as indexed by the codebook graph.

update_node(node, r)

Description

Updates the node’s empirical mean reward and upper confidence bound.

This method updates the number of pulls, empirical mean reward, and upper confidence bound for the specified node based on the new sample reward.

Parameters

nodeobject: The node to be updated.
rfloat: The new sample reward.

Returns

None

class mlcomm.algorithms.HPM(params)

Description

Implements the Hierarchical Posterior Matching (HPM) [1] algorithm. Given our fixed codebook resolution, we only use the fixed resolution (FR) impelmentation in their algorithm.

[1] Chiu, Sung-En, Nancy Ronquillo, and Tara Javidi. “Active learning and CSI acquisition for mmWave initial alignment.”: IEEE Journal on Selected Areas in Communications 37.11 (2019): 2474-2489.

Attributes

deltafloat: The target confidence for the algorithm.
fading_estimationstr: Fading estimation mode, valid choices are ‘exact’, ‘estimate’, and ‘unitary’
modestr: Fixed length (FL) or variable length (VL) where the algorithm stops after a certain number of timesteps or after achieving confidence 1-delta

Notes

Works only with binary tree codebook.

run_alg()

Executes the HPM algorithm for the simulation and logs various data points.

This method runs the core HPM algorithm, capturing key data points at various stages of the process. The logged data is stored in a dictionary and returned at the end of the method execution.

Returns

dict: A dictionary containing logged data from the simulation run. The keys in the dictionary represent different data points, and the values represent the recorded data for those points.

Raises

SimulationError: If the simulation encounters an error during execution.

class mlcomm.algorithms.MotionTS(params)

Description

Phased Track and Stop Beamforming approach from where only recent samples are considered. In the case that an empirical mean chosen is less than any previous level’s, zoom out.

Parameters

paramsdict

A dictionary of parameters for initializing the NPHTS algorithm. Must include ‘levels_to_play’, ‘delta’, and ‘epsilon’.

levels_to_play: List[int]
A list indicating the levels to be played in the algorithm.
delta: List[float]
A list of delta values for each level.
epsilon: float
The epsilon value for the algorithm.

Raises

AssertionError: If the length of ‘levels_to_play’ does not match the length of ‘delta’. If the last element of ‘levels_to_play’ is not 1.

Methods

__init__(self, params):: Initializes the NPHTS algorithm with the provided parameters.
run_alg(self):: Runs the NPHTS algorithm to find the best beamforming vector.

Notes

Relies on the TASD class for the update_node method.

Slightly modified from the implementation in [3] which has a fixed two-phase approach, whereas here we allow the user to select which levels to play by the key in params ‘levels_to_play’.

run_alg(time_horizon)

Description

Runs the algorithm to find the best beamforming vector.

This method implements the core logic of the NPHTS algorithm, leveraging the parameters initialized in the __init__ method.

Parameters

time_horizonint: Number of time steps to run the simulation

Returns

int: The midx corresponding to the estimated best beamforming vector as indexed by the codebook graph.
int: The number of samples prior to terminating

update_node(node, r, current_midxs)

Description

Updates the node’s sample history, and deletes “old” samples as specified by the sampling window length “W”

In the non-stationary setting, this “ages out” older samples of other nodes

Parameters

nodeobject: The node to be updated.
rfloat: The new sample reward.

Returns

None

class mlcomm.algorithms.NPHTS(params)

Description

Phased Track and Stop Beamforming approach from [1]. This class implements the NPHTS algorithm, an extension of the TASD class, where TASD is run for each level specified by attribute p (‘levels_to_play’) in the hierarchical beamforming codebook.

[1] Wei, Yi, Zixin Zhong, and Vincent YF Tan. “Fast beam alignment via pure exploration in multi-armed bandits.” IEEE Transactions on Wireless Communications (2022).

Original code at: https://github.com/YiWei0129/Fast-beam-alignment

Parameters

paramsdict

A dictionary of parameters for initializing the NPHTS algorithm. Must include ‘levels_to_play’, ‘delta’, and ‘epsilon’.

levels_to_play: List[int]
A list indicating the levels to be played in the algorithm.
delta: List[float]
A list of delta values for each level.
epsilon: float
The epsilon value for the algorithm.

Raises

AssertionError: If the length of ‘levels_to_play’ does not match the length of ‘delta’. If the last element of ‘levels_to_play’ is not 1.

Methods

__init__(self, params):: Initializes the NPHTS algorithm with the provided parameters.
run_alg(self):: Runs the NPHTS algorithm to find the best beamforming vector.

Notes

Relies on the TASD class for the update_node method.

Slightly modified from the implementation in [3] which has a fixed two-phase approach, whereas here we allow the user to select which levels to play by the key in params ‘levels_to_play’.

run_alg()

Description

Runs the NPHTS algorithm to find the best beamforming vector.

This method implements the core logic of the NPHTS algorithm, leveraging the parameters initialized in the __init__ method.

Returns

int: The midx corresponding to the estimated best beamforming vector as indexed by the codebook graph.
int: The number of samples prior to terminating

class mlcomm.algorithms.OffsetMAB(params)

Description

Implements the Algorithm 1 OR ALgorithm 2 from [1] using the confidence terms from the DBZ paper in stationary and non-stationary environments. Only uses the narrowest beamforming vectors, but the “arms” are actually the offset from the arm being used for communication. In updating the rewards, the node assigned the sample history was rolled from another. This mechanic accounts for the motion, as described in [1].

There are some adaptation differences with aligning the time steps with the use of a beamforming vector. The algorithm in [1] executes a beam sweep of several beamforming vectors at a single time step, which does not match my time scale.

[1] Zhang, Jianjun, et al. “Beam alignment and tracking for millimeter wave communications via bandit learning.” IEEE Transactions on Communications 68.9 (2020): 5519-5533.

Parameters

paramsdict: dict with params

Attributes

cb_graphobject: The codebook graph associated with the simulation.
epsilonfloat: ‘epsilon-greed’ policy parameter
alphafloat: discount paramter for non-stationary rewards
policystr: Specifies ‘UCB’ or ‘epsilon-greedy’ policy to use for beamforming vector selection
modestr: Specifies ‘stationary’ or ‘non-stationary’ setting, either or is a valid entry
cfloat: Secondary confidence parameter

Notes

‘epsilon’ is unused in policy ‘UCB’ and ‘c’ is unused in policy ‘epsilon-greedy’

Example

bandit = OffsetMAB({‘cb_graph’ : cb_graph, ‘channel’ : channel, ‘epsilon’ : .1, ‘alpha’ : 1e-4,’mode’ : ‘non-stationary’,’policy’ : ‘UCB’, ‘c’ : 1})

perform_sample_update_channel(nn, node_to_sample): Description

Wrapper function to perform operations necessary during sampling.

Notes

Requires 2* self.cb_graph.M +1 flops for the inner product and absolute value squared.

run_alg(time_horizon)

update_current_midxs(est_best_midx)

Description

The ‘current_midxs’ variables is intended to track the midxs associated with the subset codebook, denoted as calF_{u,b}, in [1].

Parameters

est_best_midxint: midx corresponding to the beamforming vector with the current high average rewards

Returns

current_midxslist of ints: list of midxs based on the indexing of (u,b) in [1]

class mlcomm.algorithms.PF(params)

Description

Uses the Particle Filter (EKF) from [1] to select beamforming vectors based on the nearest pointing angle, and brodens the beam based on the number of antenna elements required.

We obtain our initial state estimate by using an exhaustive search with the most narrow beams.

Unlike the original implementation in [1], which turns elements on and off to narrow or broaden the beam, we choose beamforming vectors from the ternary beamforming codebook based on the number of elements in (15) in [1], that possesses the similar beamwidth. For example, consider a ternary hiearchical codebook with beamwidths at each level: [24, 8, 2.67, 0.89] (in degrees), at time step n, the method in [1] determines 128 elments, which corresponds to roughly to the .89 degree beamwidth in our array of beamwidths. After observation at time step n+1, the algorithm determines 32 elements, which 2/32 is approximately a 3.58 degree beamwidth. Because it is wider than 2.67, we broaden the beam to the level with 8 degrees with the beamforming vector whose pattern points in the direction of the current angle estimate and broaden it.

[1] H. Chung, J. Kang, H. Kim, Y. M. Park and S. Kim, “Adaptive Beamwidth Control for mmWave Beam Tracking,” in IEEE Communications Letters, vol. 25, no. 1, pp. 137-141, Jan. 2021, doi: 10.1109/LCOMM.2020.3022877.

Attributes

Sint: number of particles
comm_nodeobject: Node from codebook object currently being used to communicate

Methods

class Particle(index, initial_weight, initial_state)

get_estimate()

get_predictions()

log_and_update_channel(nn): Description

Wrapper function to perform operations for updating the logs and channel fluctuations

resample(): Resampling routine from https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python/blob/master/12-Particle-Filters.ipynb

run_alg(time_horizon)

update_weights(z)

zeta_k()

class mlcomm.algorithms.TASD(params)

Description

Track and Stop D (TASD) [1] algorithm performed on midxs specified from the cb_graph object passed on initialization (see documentaiton for AlgorithmTemplate)

[1] Garivier, Aurélien, and Emilie Kaufmann. “Nonasymptotic sequential tests for overlapping hypotheses applied to near-optimal arm identification in bandit models.” Sequential Analysis 40.1 (2021): 61-96.

Original Julia code at https://github.com/EmilieKaufmann/BAI_epsBAI_code/blob/master/EpsilonBAIalgos.jl

Attributes

deltafloat: Confidence parameter for upper confidence bound calculations.
epsilonfloat: Tolerance paramter in which rewards are epsilon-optimal if mu + epsilon>= mu_max

Methods

__init__(self, params):: Initializes the HOSUB algorithm with the provided parameters.
run_base_alg(self):: Runs the TASD algorithm to find the best beamforming vector.
update_node(self, node, r):: Updates the node’s empirical mean reward and upper confidence bound.

run_base_alg(current_midxs=None, time_horizon=100000, update_logs=True, verbose=False)

Description

Runs the Track and Stop algorithm from [1] with Chernoff stopping + D-Tracking

Parameters

current_midxsarray or list of ints: master indices from codebook object corresponding to nodes which are played in contention in a MAB game. If left unspecified, then by default all of the master indices belonging to nodes with the narrowest beams are used.

Subordinate Functions

rate(t, delta):: Computes the rate function.
d(mu1, mu2):: Computes the divergence between two empirical values.
lambdaX(x, mua, mub, epsilon, pre=1e-12):: Computes the minimizer for lambda in (mu^- ; mu^+ - epsilon).
gb(x, mua, mub, epsilon, pre=1e-12):: Computes the minimum value of d(mua, lambda) + d(mub, lambda + epsilon).
AdmissibleAux(mu, a, epsilon):: Computes the admissible auxiliary values for a given arm.
xbofy(y, mua, mub, epsilon, pre=1e-12):: Finds x such that g_b(x) = y.
dicoSolve(f, xMin, xMax, pre=1e-11):: Finds m such that f(m) = 0 using binary search.
auxEps(y, mu, a, epsilon, pre=1e-12):: Returns F_mu(y) - 1.
aOpt(mu, a, epsilon, pre=1e-12):: Returns the optimal weights and values for the epsilon optimal arm.
OptimalWeightsEpsilon(mu, epsilon, pre=1e-11):: Returns T*(mu) and a matrix containing the candidate optimal weights.
PGLRT(muhat, counts, epsilon, Aeps, K):: Computes the parallel GLRT stopping rule and returns the best arm.

Returns

int: The midx corresponding to the estimated best beamforming vector as indexed by the codebook graph.
int: The number of samples prior to terminating

update_node(node, r, current_midxs)

Description

Updates the node’s empirical mean reward and upper confidence bound.

This method updates the number of pulls, empirical mean reward, and upper confidence bound for the specified node based on the new sample reward.

Parameters

nodeobject: The node to be updated.
rfloat: The new sample reward.

Returns

None