- Posted on
- • Questions and Answers
Use `tr` to delete non-printable Unicode characters
- Author
-
-
- User
- Linux Bash
- Posts by this author
- Posts by this author
-
Blog Article: Using tr
to Delete Non-printable Unicode Characters in Linux Bash
When working with text files in a Linux environment, you might encounter issues with non-printable characters, which can disrupt file processing or display. In this post, we’ll explore how to use the tr
command to handle these pesky characters efficiently.
Q1: What is the tr
command in Linux Bash?
A1: tr
stands for "translate" or "transliterate". It is a useful command-line utility in Unix-like operating systems, including Linux, for translating, deleting, or squeezing repeated characters. It reads from the standard input and writes to the standard output.
Q2: How can tr
be used to delete non-printable Unicode characters?
A2: To delete non-printable Unicode characters, tr
can be paired with character classes that specify the range or type of characters to target. For Unicode, this might involve specifying the range like [:print:]
, which represents all printable characters, and using the -c
(complement) and -d
(delete) options to remove characters not in this class.
Q3: Can you give a practical example of using tr
to delete non-printable characters?
A3: Certainly! Suppose you have a text file named "example.txt" that contains a mix of printable and non-printable characters. To remove all non-printable characters from the file, you can use the following command:
cat example.txt | tr -cd '\11\12\15\40-\176' > cleaned_example.txt
This command uses a range of octal character codes:
\11
is the octal code for horizontal tab.\12
is the octal code for new line.\15
is the octal code for carriage return.\40-\176
covers the range of printable ASCII characters.
Background on the Topic
The tr
command operates by either deleting specified characters or replacing one set of characters with another. Here are a couple more examples to show its versatility:
Convert lowercase to uppercase:
echo "hello world" | tr 'a-z' 'A-Z'
This command translates all lowercase letters to uppercase.
Delete digits:
echo "123 Easy Street" | tr -d '0-9'
This removes all digits from the input string, outputting " Easy Street".
Executable Script Demonstrating tr
Now, let’s create an executable script to demonstrate how tr
can clean a text file by removing non-printable characters:
#!/bin/bash
# Ensure a file name is provided
if [ "$#" -ne 1 ]; then
echo "Usage: $0 <filename>"
exit 1
fi
input_file=$1
output_file="cleaned_$input_file"
# Remove non-printable characters
tr -cd '\11\12\15\40-\176' < "$input_file" > "$output_file"
echo "Processed file saved as $output_file"
Save this script as clean_text.sh
, make it executable with chmod +x clean_text.sh
, and run it by passing a filename as an argument.
Conclusion
The tr
command is a powerful tool in the Linux toolkit, particularly useful for manipulating text data - translating character sets or purging unwanted characters. By mastering tr
, you can efficiently manage text processing tasks in your scripts or command-line operations, keeping your data clean and standardized with minimal effort.
Further Reading
For further reading and resources related to the tr
command in Linux, consider exploring these links:
GNU
tr
Manual Page: This is the official manual page providing detailed usage instructions for thetr
command. https://www.gnu.org/software/coreutils/manual/html_node/tr-invocation.htmlAdvanced Bash-Scripting Guide: An in-depth exploration of bash scripting, including text manipulation with
tr
. https://tldp.org/LDP/abs/html/textproc.htmlUnix/Linux Character Classes and
tr
Command: This tutorial offers insights into character classes and their use intr
. https://www.geeksforgeeks.org/tr-command-in-unix-linux-with-examples/tr
Command Examples for Text Manipulation: A practical guide to different ways you can use thetr
command. https://linuxize.com/post/how-to-use-linux-tr-command/Discussion on Stack Overflow - Handling Unicode Characters: Learn from community insights on handling Unicode characters with
tr
. https://stackoverflow.com/questions/6194499/pushing-files-with-unicode-characters-in-filenames-to-linux-via-git-push
These resources should provide a more comprehensive understanding of text manipulation in Linux environments, enhancing your skills with the tr
command and beyond.