A sign in Space, Part 1: Determining file format and entropy
So, I just downloaded the first file along with a metadata file; I am only going to download the other files after having exhausted this one. After download I have the following files:
In [31]: ls -lh *ATA-X*
-rw-r--r--. 1 alex alex 4.5G May 25 09:31 A_Sign_in_Space-ATA-X.sigmf-data
-rw-r--r--. 1 alex alex 659 May 25 08:47 A_Sign_in_Space-ATA-X.sigmf-meta
OK, now let’s look into the metdata file.
{
"global": {
"core:datatype": "ci8",
"core:description": "A Sign in Space: Allen Telescope Array
recording",
"core:hw": "ATA 20-antenna beamformer (X polarization)",
"core:recorder": "blade",
"core:sample_rate": 1000000,
"core:sha512":
"fa797c2cd5ea92f4b5ece999182bd62bfb1dcde147dac02a288b10853e9a1e397cb7f70a061ef271c83f96df623d596490957f80c662f81bb593840d8145e42b",
"core:version": "1.1.0"
},
"captures": [
{
"core:datetime": "2023-05-24T19:11:17Z",
"core:frequency": 8410135000,
"core:sample_start": 0
}
],
"annotations": []
}
Sooo… what can I derive from this information. It seems to be recording from a x-polarized antenna, recorded at a center frequency of 8410.135 MHz and a sample rate of 1000000 (per second, I assume).
Doing the obvious, using file
to check the data format, didn’t yield
anything but simply data
.
Let’s check the Shannon entropy over the whole file. It is
6.822892707768326
, which is not` really helpful. Let’s inspect the file in a
hexdump:
In [37]: !hexdump -n 512 -C A_Sign_in_Space-ATA-X.sigmf-data
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000040 00 00 00 00 00 00 00 00 00 00 01 01 00 00 04 0a |................|
00000050 14 04 fe ef cd 2c f9 c5 08 0a e9 16 ff fd da 1c |.....,..........|
00000060 0f 1b 0c ea 1d 4c 00 f3 f2 d4 0d 19 e7 14 0d eb |.....L..........|
00000070 e8 01 27 16 2c d8 04 fc ed 1c 1e 14 08 f6 25 e2 |..'.,.........%.|
00000080 1b 10 e5 06 fb 0d fe eb cc 09 f8 d8 ea 21 07 01 |.............!..|
00000090 f3 f1 da 0e 1e 00 09 fe 27 17 ee fc ef f8 ea 0b |........'.......|
000000a0 fa fe 29 05 42 e4 09 1c 0d 0a 06 f0 03 ea d1 09 |..).B...........|
000000b0 05 17 eb 07 f8 18 05 06 f8 f6 2d 06 19 d6 b9 0d |..........-.....|
Hm, now, there seems to be some kind of empty header of all 0x00
for the
first 80 bytes or so, or it’s just a measurement artifact of the receiver
having a squelch turned up to some squelch level.
OK, let’s try something else; as the header file indicates that it has a sample rate of 1000000, let’s try to convert this to an audio file and see if there’s something t`o listen to. Let’s grab the first 10 seconds, assume mono audio and a sample rate of 1000000, and throw it at my soundcard.
$ dd if=A_Sign_in_Space-ATA-X.sigmf-data of=10_s.raw bs=1000000 count=10
10+0 records in
10+0 records out
10000000 bytes (10 MB, 9.5 MiB) copied, 0.0126253 s, 792 MB/s
OK, but what kind are the actual samples… 8 bit? 16 bit? Signed? Unsigned? Which endianess? One can only guess, so it’s trial and error.
ffmpeg -f s16le -ar 1000000 -ac 1 -i 10_s.raw s16le_10_s.wav
(note that the resulting file has a length of 5 seconds, as I picked a word-length of 16 bit.)
OK, that just results in white noise. Trying big endian: Also just white noise. Maybe I should fast forward a little, as the first 5 seconds might not be representative at all. Fast forwarding 30 seconds and trying again:
dd if=A_Sign_in_Space-ATA-X.sigmf-data of=1m_10_s.raw bs=1000000 count=10 skip=60
ffmpeg -f s16le -ar 1000000 -ac 1 -i 1m_10_s.raw s16le_1m_10_s.wav
Also: Nope… OK, since I have better things to do now, I’m simply gonna convert the whole file in both flavors - little & big endian - for for the whole 4 GB to be processed and listen to it again.
Read you later…