Monthly Archives: May 2019

Rust – Arrays? Make chains, not concat!

If your application needs to iterate over a bunch of items from different sources or arrays, someone with C/C++ background might copy all items into a single vector and iterate this vector.

This strategy will cause high costs in terms of allocating heap memory for the consecutive vector buffer.

Instead, keep the data where it is, and chain it together to form an iterator over a virtual array.

The following Rust code demonstrates the chaining of multiple arrays, forming a single iterator, without any additional allocation of vector buffer in heap.

Note, the zip operation in the following code snippet pairs each item with a slot in the buffer. The zip will stop if either end is reached, the end of chained_byte_sequence or the end of therequest buffer, whichever comes first. Just, the length of chained_byte_sequence might be unknown and only a single pass shall be performed. So, how do we know all items have been written, and the amount? The solution is to borrow the iterator via chained_byte_sequence.by_ref(), iterating the chain and finally verifying if any item remained in the sequence chained_byte_sequence.next()!=None.

  let req_method_get = b"GET ";
    let req_method_head = b"HEAD ";
    let uri = b"/infotext.html HTTP/1.1";

    let mut request = [0 as u8; 128];

    // chaining various byte-arrays together, for example two here:
    let mut chained_byte_sequence =  req_method_get.iter()
        .chain(uri.iter());

    // take a ref and zip against the destination buffer and 
    // while iterating via fold, each element is counted.
    let  nwritten =
        chained_byte_sequence
        .by_ref()
        .zip(request.as_mut())
            .fold(0, |cnt, (item, slot) | {
                *slot = item.clone(); cnt+1
            });

    // finally, verify the iterator of the chained_byte_sequence is empty
    if chained_byte_sequence.next()!=None {
        /* slice too short */ panic!();
    } else {
        println!("{:?}", &request[0..nwritten].as_ref());
    };

Rust – Handling Executables and their Debug-Symbols

This post is about compiling Rust-code, the executables, the handling of the corresponding debug symbols, build-ids and core-files. It highlights the importance of debug-symbols for debugging and how to strip the debug-symbols off the binary before shipping to customer.

Let’s re-use the existing cargo Rust-project for the following samples https://github.com/frehberg/rust-releasetag/tree/master/ This project enables us to produce crash core-files. A simplified main.rs looks like

use std::time::Duration; 
use std::thread;
use std::io::stdout;
use std::io::Write;

fn main() {
    println!("Waiting until being aborted");
    loop {
      thread::sleep(Duration::from_millis(200));
      print!(".");
      stdout().flush().ok();
    }
}

The command cargo build will produce a debug binary

-rwxr-xr-x 2 frehberg frehberg 1698344 Mai 14 21:22 target/release/test-tag

And the command cargo build --release will produce a release binary

-rwxr-xr-x 2 frehberg frehberg 1831056 Mai 14 21:22 target/debug/test-tag

Both vary only a few KBytes, still both contain debug-symbols.

The following command will produce a release binary, stripping away the debug symbols

RUSTFLAGS='-C link-arg=-s' cargo build --release

and the resulting binary will be only a fraction of size

-rwxr-xr-x 2 frehberg frehberg 198992 Mai 14 21:27 target/release/test-tag

So why is the tool cargo keeping the debug symbols? Simple answer, those debug symbols are required by debugging tools like gdb, etc. to map the blocks of binary operations onto the original Rust source code. The debug symbols cover all of the Rust code having been compiled into the binary. That might explain the additional size and overhead. This can be demonstrated best with the debug-build, as the release build is optimizing and re-ordering the commands.

$ cargo build
...  
$ gdb target/debug/test-tag 
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from target/debug/test-tag...done.
warning: Missing auto-load script at offset 0 in section .debug_gdb_scripts
of file /home/frehberg/src/proj-releasetag/rust-releasetag/test/target/debug/test-tag.
Use `info auto-load python-scripts [REGEXP]' to list them.
(gdb) l
1    use std::time::Duration;
2    use std::thread;
3    use std::io::stdout;
4    use std::io::Write;
5    
6    fn main() {
7        println!("Waiting until being aborted");
8        loop {
9          thread::sleep(Duration::from_millis(200));
10          print!(".");
(gdb) quit

Only stripped binaries should be shipped to customers, but corresponding debug symbols should be kept in the back-hand for debugging purposes. Linux provides command line tools to strip the debug-symbols from executables and binaries.

Note: At this point, it is important to understand that each binary being produced by compiler gcc, or llvm (rustc) is tagged with a unique sha1 BuildId, and this Id can be extracted from binary using the tool file (read gdb docu). The following binary contains the BuildId 11e0989b7ecb6ec4cd87c526a1dcd7ba3a2a81f5

$ file target/release/test-tag 
target/release/test-tag: ELF 64-bit
LSB shared object, x86-64, version 1 (SYSV), 
dynamically linked, interpreter /lib64/l, 
for GNU/Linux 3.2.0, 
BuildID[sha1]=11e0989b7ecb6ec4cd87c526a1dcd7ba3a2a81f5, 
with debug_info, not stripped

To avoid erroneous debugging sessions, the debugging tools do enforce that build-id for executable and debug-symbols are identical. Depending on compiler-version, build-flags, etc. the build-id may change for identical Rust-code.

As little changes (and maybe timestamps) will influence and change the BuildId, a single build-queue should be used to produce a file containing both, debug-symbols and executable code. And at the end of the build-process, these files should be archived and the stripped variants should be derived for delivery.

Lemma: Never embed build-timestamps or other dynamic environment-values into your code, as two builds of identical sources would result in slightly different binaries, containing different BuildIds. It would not be possible to rebuild a specific release-branch of your repo and using such binaries and debug-symbols to analyze a core file from crash-report.

The presence of debug-symbols even in cargo’s release-build permits the archive them together with debug-symbols and to strip the debug-symbols before shipping to customer, as demonstrated in the following code.

cargo build --release
cp target/release/test-tag  test-tag.dbg
strip target/release/test-tag
ls -al test-tag.dbg target/release/test-tag
-rwxr-xr-x 2 frehberg frehberg  198992 Mai 14 22:38 target/release/test-tag
-rwxr-xr-x 1 frehberg frehberg 1502400 Mai 14 22:38 test-tag.dbg

The tool file will proof the executable no longer contains any debug symbols. Please note that both files will contain the identical BuildID[sha1]` `11e0989b7ecb6ec4cd87c526a1dcd7ba3a2a81f5

file target/release/test-tag
target/release/test-tag: ELF 64-bit LSB shared object, 
x86-64, version 1 (SYSV), dynamically linked, 
interpreter /lib64/l, for GNU/Linux 3.2.0,
BuildID[sha1]=11e0989b7ecb6ec4cd87c526a1dcd7ba3a2a81f5, stripped

$ file test-tag.dbg 
test-tag.dbg: ELF 64-bit LSB shared object, 
x86-64, version 1 (SYSV), dynamically linked, 
interpreter /lib64/l, for GNU/Linux 3.2.0,
BuildID[sha1]=11e0989b7ecb6ec4cd87c526a1dcd7ba3a2a81f5, with debug_info, not stripped

Extracting debug-symbols exclusively into a separate file can be performed with the following command

objcopy --only-keep-debug target/release/test-tag test-tag.dbg

Note I cannot recommend to archive executable and debug-symbols in separate files, as often the tool gdb is facing problems to merge both back in debugging sessions later, when feeding in two files as following

gdb -s test-tag.dbg -e target/release/test-tag 

Assuming the debug-symbols and the executable are handled in a single file test-tag.dbg it is possible to run and step thru the code with tool gdb, simply with the following command

$ gdb test-tag.dbg 
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from test-tag.dbg...done.
(gdb) run
Starting program: /home/frehberg/src/proj-releasetag/rust-releasetag/test/test-tag.dbg 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Waiting until being aborted
..........^C
Program received signal SIGINT, Interrupt.
0x00007ffff77bbc31 in __GI___nanosleep (requested_time=0x7fffffffd8c0, remaining=0x7fffffffd8c0)
    at ../sysdeps/unix/sysv/linux/nanosleep.c:28
28	../sysdeps/unix/sysv/linux/nanosleep.c: No such file or directory.
(gdb) bt
#0  0x00007ffff77bbc31 in __GI___nanosleep (requested_time=0x7fffffffd8c0, 
    remaining=0x7fffffffd8c0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
#1  0x000055555555a8f1 in sleep () at src/libstd/sys/unix/thread.rs:153
#2  sleep () at src/libstd/thread/mod.rs:780
#3  0x0000555555558081 in test_tag::main ()
#4  0x0000555555558673 in std::rt::lang_start::{{closure}} ()
#5  0x00005555555627c3 in {{closure}} () at src/libstd/rt.rs:49
#6  do_call<closure,i32> () at src/libstd/panicking.rs:293
#7  0x000055555556454a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:85
#8  0x000055555556327d in try<i32,closure> () at src/libstd/panicking.rs:272
#9  catch_unwind<closure,i32> () at src/libstd/panic.rs:388
#10 lang_start_internal () at src/libstd/rt.rs:48
#11 0x0000555555558192 in main ()

One more practical detail In case of process-crashes (eg caused by panic!()) the system will dump the process (stack) into a so called core-file, containing the BuildId of the executable, as well as the BuildIds of all dynamic shared-object libraries (so-files), as found on that platform when starting up the process (not the so-files in your build-environment). Tools as gdb will verify for identical build-ids in core file and the debug-symbol files, otherwise ignoring them.

The tool eu-unstrip (debian package elfutils) permits to extract the corresponding build-ids of executable and of the dynamic libraries during runtime (BuildId 11e0989b7ecb6ec4cd87c526a1dcd7ba3a2a81f5 in first line of output). The BuildIds of the dynamic libraries are:

  • e79e03dc6f0672a9832c68270af07f68f649daf8 –> linux-vdso.so.1
  • 64df1b961228382fe18684249ed800ab1dceaad4 –> ld-2.27.so ld-linux-x86-64.so.2
  • etc.
$ eu-unstrip -n --core=core
0x560be742c000+0x231000 11e0989b7ecb6ec4cd87c526a1dcd7ba3a2a81f5@0x560be742c2bc . - /home/frehberg/src/proj-releasetag/rust-releasetag/test/target/release/test-tag
0x7ffca42e3000+0x2000 e79e03dc6f0672a9832c68270af07f68f649daf8@0x7ffca42e37d0 . - linux-vdso.so.1
0x7f38519ba000+0x229170 64df1b961228382fe18684249ed800ab1dceaad4@0x7f38519ba1d8 /lib64/ld-linux-x86-64.so.2 /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.27.so ld-linux-x86-64.so.2
0x7f3850d86000+0x3f0ae0 b417c0ba7cc5cf06d1d1bed6652cedb9253c60d0@0x7f3850d86280 /lib/x86_64-linux-gnu/libc.so.6 /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.27.so libc.so.6
0x7f3851177000+0x217430 f98df367fb1e663c3b1a49ef86b42e9ec66754f2@0x7f38511771d8 /lib/x86_64-linux-gnu/libgcc_s.so.1 - libgcc_s.so.1
0x7f385138f000+0x21e480 28c6aade70b2d40d1f0f3d0a1a0cad1ab816448f@0x7f385138f248 /lib/x86_64-linux-gnu/libpthread.so.0 /usr/lib/debug/.build-id/28/c6aade70b2d40d1f0f3d0a1a0cad1ab816448f.debug libpthread.so.0
0x7f38515ae000+0x207be0 9826fbdf57ed7d6965131074cb3c08b1009c1cd8@0x7f38515ae1d8 /lib/x86_64-linux-gnu/librt.so.1 /usr/lib/debug/lib/x86_64-linux-gnu/librt-2.27.so librt.so.1
0x7f38517b6000+0x203110 25ad56e902e23b490a9ccdb08a9744d89cb95bcc@0x7f38517b61d8 /lib/x86_64-linux-gnu/libdl.so.2 /usr/lib/debug/lib/x86_64-linux-gnu/libdl-2.27.so libdl.so.2

Please, compare the list above with the list of shared-objects the executable depends on

$  ldd target/release/test-tag
	linux-vdso.so.1 (0x00007ffef89b0000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f543fa37000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f543f82f000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f543f610000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f543f3f8000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f543f007000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f543fe6c000)

Now, if you managed to fetch the corresponding debug-symbol files from archive for executable and dynamic librarries, the tool gdb can be used to print the backtrace of the stack the moment the process has been aborted.

Note: The solib-search-path of tool gdbdefines the locations the shared object files are searched for. The default value for the solib-search-path variable is the working directory “.” So, easiest is to collect the specific shared-obejct (so) libs with matching BuildIds, place them in a single folder, and changing to that directory to execute the tool gdb.

$ gdb --core core target/release/test-tag 
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from target/release/test-tag...done.
[New LWP 12597]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./target/release/test-tag'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f4afca12c31 in __GI___nanosleep (requested_time=0x7ffd0ddd6190, 
    remaining=0x7ffd0ddd6190) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
28    ../sysdeps/unix/sysv/linux/nanosleep.c: No such file or directory.
(gdb) bt
#0  0x00007f4afca12c31 in __GI___nanosleep (requested_time=0x7ffd0ddd6190, 
    remaining=0x7ffd0ddd6190) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
#1  0x0000560ae96808f1 in sleep () at src/libstd/sys/unix/thread.rs:153
#2  sleep () at src/libstd/thread/mod.rs:780
#3  0x0000560ae967e081 in test_tag::main ()
#4  0x0000560ae967e673 in std::rt::lang_start::{{closure}} ()
#5  0x0000560ae96887c3 in {{closure}} () at src/libstd/rt.rs:49
#6  do_call<closure,i32> () at src/libstd/panicking.rs:293
#7  0x0000560ae968a54a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:85
#8  0x0000560ae968927d in try<i32,closure> () at src/libstd/panicking.rs:272
#9  catch_unwind<closure,i32> () at src/libstd/panic.rs:388
#10 lang_start_internal () at src/libstd/rt.rs:48
#11 0x0000560ae967e192 in main ()

Note: By default core files will contain process-stacks of main-thread and other threads only. So if the process status or user settings are required to understand the situation or use-case the crash occured, it is handy to store such information on stack (as done for crate releasetag 😉 Changing settings of the OS, also heap-memory may be dumped into the core file. This might tell you more details, but keep in mind that you might receive GByte-core-files.

Summary

Any time an executable shall be shipped to customer, perform the following steps, otherwise if the debug-symbols got lost, you might not be able run the released code in debugger, nor being able to analyze core-files:

  1. At first extract the BuildId from executable(s) and dynamic libraries,
  2. Then archive the executable(s) and dynamic libraries and using the build-ids from previous step as database-keys,
  3. Finally strip the debug-symbols from these executables and dynamic libraries before packaging and shipping to customer.
  4. Receiving a crash-report, extract the build-id from core file, fetch the corresponding binaries containing the debug-symbols, and analyze what caused the crash.

EDIT

Release mode optimization can be configured in Cargo.toml the following way, being used only when calling “cargo build –release”, here adding debug symbols to your binary.

[profile.release]
opt-level = 3
strip = false
debug = true
codegen-units = 1
lto = true

The optimization levels correspond are defined as

The opt-level setting controls the -C opt-level flag which controls the level of optimization. Higher optimization levels may produce faster runtime code at the expense of longer compiler times. Higher levels may also change and rearrange the compiled code which may make it harder to use with a debugger.

The valid options are:

  • 0: no optimizations
  • 1: basic optimizations
  • 2: some optimizations
  • 3: all optimizations
  • "s": optimize for binary size
  • "z": optimize for binary size, but also turn off loop vectorization.