
-------------------------------------------------------------------

     =========================================================
     Geant4 - an Object-Oriented Toolkit for Simulation in HEP
     =========================================================

                    Example ThreadsafeScorers
                    ----------------------------

 This example demonstrates a very simple application where an energy
 deposit and # of steps is accounted in thread-local (i.e. one instance per 
 thread) hits maps with underlying types of plain-old data (POD) and global
 (i.e. one instance) hits maps with underlying types of atomics.
 The example uses a coarse mesh, extensive physics, and step limiters
 to ensure that there is a higher degree of conflict between threads
 when updating the scorers to test the robustness of the atomics
 classes and maximize the compounding of thread-local round-off error.
    At the end of the simulation, the scorers are printed to
 "mfd_<DATA_TYPE>_<SCORER_TYPE>.out", where DATA_TYPE is either 
 "tl" (thread-local) or "tg" (thread-global) and SCORER_TYPE is "EnergyDeposit"
 or "NumberOfSteps". These values are then compared to a thread-global
 sum of these scorers that were updated via mutex locking. If round-off
 errors in thread-local EnergyDeposit are present, they can be viewed
 in "mfd_diff.out" at the end of the simulation
    This example also provides a demonstration of the TiMemory (timing and
 memory analysis) package provided in Geant4 -- for documentation of TiMemory
 see https://github.com/jrmadsen/TiMemory.

 1- ATOMICS and the ATOMIC SCORERS

   atomics can ONLY handle plain-old data (POD) types, e.g. int, double, etc.
   The implementation of atomics in compiler-dependent. At the very worst,
   the performance of an atomic is the same mutex locking.
   Atomics, in general, are not copy-constructable. This has to do with
   thread safety (e.g. making a copy while another thread tries to update)
   This is why atomics cannot be used in STL containers. The implementation
   in atomic.hh has limited copy-construction and still cannot be used in
   STL containers. Use these copy-constructors with extreme caution. See
   opening comments of G4atomic.hh for more details.
   
   The newly provided classes in this example (G4atomic, G4TAtomicHitsMap, and
   G4TAtomicHitsCollection) are intended for applications where memory is a 
   greater concern than performance. While atomics generally perform better than
   mutex locking, the synchronization is not without a cost. However, since
   the memory consumed by thread-local hits maps scales roughly linearly
   with the number of threads, simulations with a large number of scoring
   volumes can decrease simulation time by increasing the number of threads 
   beyond what was previously allowed due to the increase in memory consumption.
   
   The G4TAtomicHitsMap and G4TAtomicHitsCollection work exactly the same way
   as the standard G4THitsMap and G4THitsCollection, respectively, with the
   exception(s) that you should only implement one instance and provide a 
   pointer/reference of that instance to the threads instead of having the
   threads create them. Additionally, there is no need to include them
   in the G4Run::Merge().
    
 2- GEOMETRY DEFINITION
  
   The geometry is constructed in the TSDetectorConstruction class.
   The setup consists of a box filling the world. The volume is divided into
   subregions, where the outermost boxes are a different material. The materials
   by default are water and boron as these have large scattering cross-sections 
   for neutrons (the default particle). 
    
 3- PHYSICS LIST
 
   The particle's type and the physic processes which will be available
   in this example are set are built from a variety of physics constructors.
   The chosen physics lists are extensive, primarily
   The constructors are:

      G4EmStandardPhysics_option4
      G4DecayPhysics
      G4RadioactiveDecayPhysics
      G4HadronPhysicsQGSP_BERT_HP
      G4HadronElasticPhysicsHP
      G4StepLimiterPhysics
      G4IonElasticPhysics
      G4IonBinaryCascadePhysics

 4- ACTION INITALIZATION

   TSActionInitialization, instantiates and registers to Geant4 kernel
    all user action classes.

   While in sequential mode the action classes are instatiated just once,
   via invoking the method:
      TSActionInitialization::Build()
   in multi-threading mode the same method is invoked for each thread worker
   and so all user action classes are defined thread-local.

   A run action class is instantiated both thread-local 
   and global that's why its instance is created also in the method
      TSActionInitialization::BuildForMaster() 
   which is invoked only in multi-threading mode.
     
 5- PRIMARY GENERATOR
  
   The primary generator is defined in the TSPrimaryGeneratorAction class.
   The default kinematics is a 1 MeV neutron, randomly distributed in front
   of the target across 100% of the transverse (X,Y) target size.
   This default setting can be changed via the Geant4 built-in commands 
   of the G4ParticleGun class.
     
 6- DETECTOR RESPONSE

   This example demonstrates a scoring implemented
   in the user action classes and TSRun object.
   
   The energy deposited is collected per event in the PrimitiveScorer
   G4PSEnergyDeposit (as part of a MultiFunctionalDetector)
   and the thread-local version are merged at the end of the run.
   
   The number of steps is collected per event in the PrimativeScorer
   G4PSNoOfSteps and the thread-local version are merged at the end of the run.

   When the MFD is recording an event i.e. TSRun::RecordEvent(const G4Event*),
   the global atomic hits map adds the same hits collections

   In multi-threading mode the energy accumulated in TSRun MFD object per
   workers is merged to the master in TSRun::Merge().

   TSRun contains five hits collections types:
       1) a thread-local hits map,
       2) a global atomic hits map
       3) a global "mutex" hits map
       4) a global G4StatAnalysis hits deque
       5) a global G4ConvergenceTester hits deque

   The thread-local hits map is the same as you will find in many other
       examples.

   The atomics hits map is the purpose of this example. Code-wise, the
       implementation looks extremely similar to the thread-local version with
       3 primary exceptions:
       (1) construction - there should only be one instance so it should be a
           static member variable or a pointer/reference to a single instance
       (2) It does not need to, nor should be, summed in G4Run::Merge()
       (3) destruction -- it should only be cleared by the master thread since
           there is only one instance.

   The "mutex" hits map is also included as reference for checking the results
       accumulated by the thread-local hits maps and atomic hits maps. The
       differences w.r.t. this hits maps are computed in
       TSRunAction::EndOfRunAction

   The "G4StatAnalysis" and "G4ConvergenceTester" hits deques are
       memory-efficient version of the standard G4THitsMap. While maps are
       ideal for scoring at the G4Event-level, where sparsity w.r.t. indices
       is common; at the G4Run-level, these data structures require much
       less memory overhead. Due to a lack of
       G4ConvergenceTester::operator+=(G4ConvergenceTester), the static version
       of G4ConvergenceTester is the only valid way to use G4ConvergenceTester
       in a scoring container. This is not the case for G4StatAnalysis, which
       can be used in lieu of G4double.

7- HOW TO RUN

    - Execute ts_scorers in the 'interactive mode' with visualization:
        % ./ts_scorers
      and type in the commands from run.mac line by line:
        Idle> /control/verbose 2
        Idle> /tracking/verbose 1
        Idle> /run/beamOn 10 
        Idle> ...
        Idle> exit
      or
        Idle> /control/execute run.mac
        ....
        Idle> exit

    - Execute ts_scorers in the 'batch' mode from macro files
      (without visualization)
        % ./ts_scorers run.mac
        % ./ts_scorers run.mac > run.out

8- TIMEMORY USAGE

    This example demonstrates timing and memory analysis with TiMemory
    (https://github.com/jrmadsen/TiMemory).

    - Compile Geant4 with TiMemory (-DGEANT4_USE_TIMEMORY=ON)
    - TiMemory auto-timer provide timing within the Geant4 source code
      and within the example (TSRun::RecordEvent)
    - Analysis is echoed to stdout, recorded in ts_scorers.out, and
      serialized in ts_scorers.json
    - Generates plots:
      "timemory-plotter -f ts_scorers.json -t "ThreadSafe Scorers" -o plots -e"
    - Uploads plots to CDash if enabled
