Block averages (standard deviation, standard error)

It is typically useful to divide an MD trajectory into non-overlapping segments, and to calculate properties for each segment individually.

You can then calculate the average and standard deviation or standard error of any calculated quantity.

This might be useful to determine if equilibration has been reached, or to get error bars on any calculated quantity.

lionanalysis includes a utility called blocklionanalysis.sh which is located in the utils/ folder of the source tree. It divides the trajectory into N non-overlapping segments and runs the analysis for each segment. All outputs are saved for later analysis.

To proceed, make sure blocklionanalysis.sh is in your PATH.

[2]:
%%sh
blocklionanalysis.sh -h
olblock.sh -h
    show help
olblock.sh -d <dumpfile> -c <configfile> -n <numblocks> [-N <numframes>]  [-X <file-with-numframes>]
olblock.sh -d <dumpfile> -c <configfile> -s <sizeblock> [-N <numframes>]  [-X <file-with-numframes>]
    -n : give the number of blocks (program finds out size of each block)
    -s : give the size of teach block (program finds out the number of blocks)
    -t : run total as well
    -N : specify the total number of frames in <dumpfile> (default is to find out, but can take some time for large dumpfiles)
    -X : the <file-with-numframes> contains a single word with the number of frames in <dumpfile>
[9]:
%%sh
cat > ba1.config <<EOF
Overwrite
Threads 1
CoutFrequency 200

DefineGroup Na ATOMICNUMBER Na
DefineGroup O ATOMICNUMBER O
RDF Na Na rdf_na_na.txt Resolution 0.2 MaxDist 6.0
RDF Na O rdf_na_o.txt Resolution 0.1 MaxDist 6.0
EOF
# specify config file, trajectory, and number of blocks/segments
blocklionanalysis.sh -c ba1.config -d trajectories/NaOH.xyz -n 4

Found 5001 frames in trajectories/NaOH.xyz
#blocks: 4 ; #frames per block: 1250 (block 4: 1251 ); I will start at block 1 and stop at block 4
olflags:
AT BLOCK 1 (size 1250); startbyte 0
WARNING: DumpFile trajectories/NaOH.xyz specified in config file.
WARNING: I will NOT use this file but instead the one specified on command line: trajectories/NaOH.xyz
StartByte 0
DumpFile trajectories/NaOH.xyz
MinTimestep 0
MaxTimestep 1250
Suffix _4_1
Overwrite
Threads 1
CoutFrequency 200
DefineGroup Na ATOMICNUMBER Na
DefineGroup O ATOMICNUMBER O
RDF Na Na rdf_na_na.txt Resolution 0.2 MaxDist 6.0
RDF Na O rdf_na_o.txt Resolution 0.1 MaxDist 6.0
###########################################################
Running on 1 threads on host .SCMCoffee1
There are 3 groups and 5 actions.
I am lionanalysis version 8:10M
Opening dumpfile trajectories/NaOH.xyz at byte 0 : 0
      -999       (1) (       0 ps) (0.000216961 s) (  0.0201 %)
      -800       (200) (     199 ps) (0.091712952 s) (       4 %)
      -600       (400) (     399 ps) (0.1808629 s) (       8 %)
      -400       (600) (     599 ps) (0.26924992 s) (      12 %)
      -200       (800) (     799 ps) (0.35710692 s) (      16 %)
         0       (1000) (     999 ps) (0.44463396 s) (      20 %)
       200       (1200) (    1199 ps) (0.53194404 s) (      24 %)
Closing dumpfile trajectories/NaOH.xyz at byte 6260265
Finished reading dump file... 1250 frames... Writing output....
FIRST TIMESTEP: -999
LAST TIMESTEP: 250
Total time: 0.55458498 s; 0.00044366798 s/timestep
Goodbye!
AT BLOCK 2 (size 1250); startbyte 6260265
StartByte 6260265
DumpFile trajectories/NaOH.xyz
MinTimestep 0
MaxTimestep 1250
Suffix _4_2
Overwrite
Threads 1
CoutFrequency 200
DefineGroup Na ATOMICNUMBER Na
DefineGroup O ATOMICNUMBER O
RDF Na Na rdf_na_na.txt Resolution 0.2 MaxDist 6.0
WARNING: DumpFile trajectories/NaOH.xyz specified in config file.
WARNING: I will NOT use this file but instead the one specified on command line: trajectories/NaOH.xyz
RDF Na O rdf_na_o.txt Resolution 0.1 MaxDist 6.0
###########################################################
Running on 1 threads on host .SCMCoffee1
There are 3 groups and 5 actions.
I am lionanalysis version 8:10M
Opening dumpfile trajectories/NaOH.xyz at byte 6260265 : 6260265
      -999       (1) (       0 ps) (0.000211 s) (      25 %)
      -800       (200) (     199 ps) (0.088485003 s) (      29 %)
      -600       (400) (     399 ps) (0.177001 s) (      33 %)
      -400       (600) (     599 ps) (0.26565289 s) (      37 %)
      -200       (800) (     799 ps) (0.35426211 s) (      41 %)
         0       (1000) (     999 ps) (0.44262195 s) (      45 %)
       200       (1200) (    1199 ps) (0.53090692 s) (      49 %)
Closing dumpfile trajectories/NaOH.xyz at byte 12518905
Finished reading dump file... 1250 frames... Writing output....
FIRST TIMESTEP: -999
LAST TIMESTEP: 250
Total time: 0.55390191 s; 0.00044312153 s/timestep
Goodbye!
AT BLOCK 3 (size 1250); startbyte 12518905
StartByte 12518905
DumpFile trajectories/NaOH.xyz
MinTimestep 0
MaxTimestep 1250
Suffix _4_3
Overwrite
Threads 1
CoutFrequency 200
DefineGroup Na ATOMICNUMBER Na
DefineGroup O ATOMICNUMBER O
RDF Na Na rdf_na_na.txt Resolution 0.2 MaxDist 6.0
WARNING: DumpFile trajectories/NaOH.xyz specified in config file.
WARNING: I will NOT use this file but instead the one specified on command line: trajectories/NaOH.xyz
RDF Na O rdf_na_o.txt Resolution 0.1 MaxDist 6.0
###########################################################
Running on 1 threads on host .SCMCoffee1
There are 3 groups and 5 actions.
I am lionanalysis version 8:10M
Opening dumpfile trajectories/NaOH.xyz at byte 12518905 : 12518905
      -999       (1) (       0 ps) (0.000208855 s) (      50 %)
      -800       (200) (     199 ps) (0.088582993 s) (      54 %)
      -600       (400) (     399 ps) (0.17657089 s) (      58 %)
      -400       (600) (     599 ps) (0.26465487 s) (      62 %)
      -200       (800) (     799 ps) (0.35317087 s) (      66 %)
         0       (1000) (     999 ps) (0.44119096 s) (      70 %)
       200       (1200) (    1199 ps) (0.52900982 s) (      74 %)
Closing dumpfile trajectories/NaOH.xyz at byte 18777733
Finished reading dump file... 1250 frames... Writing output....
FIRST TIMESTEP: -999
LAST TIMESTEP: 250
Total time: 0.55171084 s; 0.00044136868 s/timestep
Goodbye!
AT BLOCK 4 (size 1251); startbyte 18777733
StartByte 18777733
DumpFile trajectories/NaOH.xyz
MinTimestep 0
Suffix _4_4
Overwrite
Threads 1
CoutFrequency 200
DefineGroup Na ATOMICNUMBER Na
DefineGroup O ATOMICNUMBER O
RDF Na Na rdf_na_na.txt Resolution 0.2 MaxDist 6.0
RDF Na O rdf_na_o.txt Resolution 0.1 MaxDist 6.0
###########################################################
Running on 1 threads on host .SCMCoffee1
There are 3 groups and 5 actions.
I am lionanalysis version 8:10M
Opening dumpfile trajectories/NaOH.xyz at byte 18777733 : 18777733
      -999       (1) (       0 ps) (0.000233889 s) (      75 %)
WARNING: DumpFile trajectories/NaOH.xyz specified in config file.
WARNING: I will NOT use this file but instead the one specified on command line: trajectories/NaOH.xyz
      -800       (200) (     199 ps) (0.089896917 s) (      79 %)
      -600       (400) (     399 ps) (0.17751789 s) (      83 %)
      -400       (600) (     599 ps) (0.266675 s) (      87 %)
      -200       (800) (     799 ps) (0.35485387 s) (      91 %)
         0       (1000) (     999 ps) (0.44257784 s) (      95 %)
       200       (1200) (    1199 ps) (0.53165197 s) (      99 %)
Closing dumpfile trajectories/NaOH.xyz at byte -1
Finished reading dump file... 1251 frames... Writing output....
FIRST TIMESTEP: -999
LAST TIMESTEP: 251
Total time: 0.55490279 s; 0.00044356738 s/timestep
Goodbye!

In the input file we specified to store the RDF in rdf_na_o.txt. The block averaging has created four output files, rdf_na_o.txt_4_1, …, rdf_na_o.txt_4_4. The first number is the number of blocks, and the second the block index.

[25]:
from pathlib import Path
from common import rdf2df
import matplotlib.pyplot as plt
import pandas as pd

rdf_files = list(Path(".").glob("rdf_na_o.txt_*_*"))
dfs = [rdf2df(x) for x in rdf_files]

fig, ax = plt.subplots()
for df in dfs:
    df.plot(x=0, y=1, ax=ax, xlabel="r (ang.)", ylabel="g(r)")

ax.legend(rdf_files);
../_images/examples_06a_Block_Averages_4_0.png

With seaborn it is easy to plot the 95% confidence interval as a shaded area:

[28]:
import seaborn as sns
aggregated_df = pd.concat(dfs)

sns.lineplot(aggregated_df, x="#r", y="RDF");
# sns.lineplot(aggregated_df, x="#r", y="RDF", err_style="bars");
../_images/examples_06a_Block_Averages_6_0.png