Awk: how to change timestamp format of a column
Recently i had a task to convert a csv file with timestamps in one column from a format to another! i didn't want to use Python for this…
Recently i had a task to convert a csv file with timestamps in one column from a format to another! i didn't want to use Python for this since i think it is an overkill nor universal in terms of portability, i tried to do bash manipulation using reading the file in an array using cut to get the desired field and use date to change the format, it worked but was ugly and underperforming! The best solution was awk! lets see how!.
Scenario
In my case what exactly i had to do is to modify a human readable timestamp to an epoch timestamp of nanosecond accuracy.
2023-06-24T18:59:42.803Z ---> 1687633182803000000 Create the following file and save it as test.csv
2023-06-24T18:59:42.803Z,ford,focus
2023-06-24T19:10:47.626Z,mazda,mx5
2023-06-24T22:18:50.391Z,toyota,yaris
2023-06-23T15:59:41.591Z,toyota,corolla
2023-06-23T18:52:42.346Z,nissan,micraThe first column is the column that we want to transform to epoch.
Working with awk
Lets try something basic first, lets see how to parse only the first column which is the timestamp column and convert it to epoch.
$ awk -F "," '{system("date -d "$1" +%s%9N")}' test.csv
1687633182803000000
1687633847626000000
1687645130391000000
1687535981591000000
1687546362346000000- We invoked awk to parse file test.csv,
-F “,”option said to awk that the file is comma delimited - then we used the
systemcommand of awk to execute the systemdatecommand in my Linux box for each element of of the first column (the $1 variable) - date command using the
-dparameter and the+%s%9Nformatting string parsed dates and converted them to epoch seconds with nano second accuracy
You can check more about the date command and its formating options here https://man7.org/linux/man-pages/man1/date.1.html
Now lets try the complete thing! to print the rest of the columns along with the converted timestamp
$ awk -F "," '{"date -d "$1" +%s%9N" | getline $1;printf("%s,%s,%s\n",$1,$2,$3)}' test.csv
1687633182803000000,ford,focus
1687633847626000000,mazda,mx5
1687645130391000000,toyota,yaris
1687535981591000000,toyota,corolla
1687546362346000000,nissan,micraWhat we did here is
- for each value of $1 executed the date command to convert the timestamp and with the
| getlinewe passed the value back to $1 - Then using printf we printed the column values with the selected format
But what if we wanted to keep the old time format and the same time convert the timestamp?
$ awk -F "," '{"date -d "$1" +%s%9N" | getline ts;printf("%s,%s,%s,%s\n",ts,$1,$2,$3)}' test.csv
1687633182803000000,2023-06-24T18:59:42.803Z,ford,focus
1687633847626000000,2023-06-24T19:10:47.626Z,mazda,mx5
1687645130391000000,2023-06-24T22:18:50.391Z,toyota,yaris
1687535981591000000,2023-06-23T15:59:41.591Z,toyota,corolla
1687546362346000000,2023-06-23T18:52:42.346Z,nissan,micraWhat we did now is very similiar to the previous example but with some slight differences
| getline tswe stored the converted timestamp to a variable namedts- using printf we print all three existing column values plus the value of the converted timestamp (ts)
Conclusion
Awk is a complete language but can be used also as a command line tool in order to make useful oneliners that can help us to do quick wins instead of using a language like Python! it seems that old tools still have their value in our modern Linux echo system and are not going to replaced any time soon!