I have been meaning to note down my *nix checklist of commands (For MacOS) which are very handy for basic operations on data. I will modify this post as and when I remember or come across something that fits here. These *nix commands are specifically tested for Mac OS.
Uniques
uniq - This is the unix unique function which can be primarily used to remove duplicates from a file amongst other things. The file has to be pre sorted for uniq to work
Consider file test which contains the following
$ cat test
aa
bb
bb
cc
cc
ccRemove duplicates
$uniq test
aa
bb
ccCount occurences of each item
$ uniq -c test
1 aa
2 bb
3 ccPrint only duplicate items in file
$ uniq -d test
bb
ccPrint only unique lines
$ uniq -u test
aaConsider test now contains
$cat test
aa
bb
cc
AA
cCRemove duplicate case insensitive. This file is not sorted though. So it has to be sorted first before uniq. -i flag is for case in sensitive
$ sort test | uniq -i
AA
bb
cCSort a fixed width file by a field which begins from 10th byte and ends at 20th
sort -k1.10,1.20 file | head -10Case conversion
Convert all upper case in fileA to lower case and output as fileB
$ tr '[:upper:]' '[:lower:]' < fileA.txt > fileB.txtUsing tr to replace a string/char in file Convert all carriage returns to newline chars
$ tr '^M' '\n' < input.csv > output.csvDelete All CR+LF chars from file
$ tr -d '^M\n' < inpfile.txt > outfile.txt
Remove extra spaces in a file
tr -s " " < file.txt > fileout.txt
File comparision
Compare two files and keep strings present in fileA but not in fileB
$ comm -23 fileA fileBCompare two files and keep strings present in fileB but not in fileA
$ comm -13 fileA fileBCompare two files and keep only strings which are present in both files
$ comm -3 fileA fileBSed
Primary purpose of sed is string replacement or pattern replacement.
Consider the following file as input
$ cat file.txt
unix is great os. unix is opensource. unix is free os.
learn operating system.
unixlinux which one you choose.- Replacing or substituting string
$ sed 's/unix/linux/' file.txt
linux is great os. unix is opensource. unix is free os.
learn operating system.
linuxlinux which one you choose.By default, the sed command replaces the first occurrence of the pattern in each line and it won't replace the second, third...occurrence in the line. Here the "s" specifies the substitution operation. The "/" are delimiters. The "unix" is the search pattern and the "linux" is the replacement string. If you miss a delimiter then the expression errors out as below
$ sed 's/unix/linux' file.txt
sed: 1: "s/unix/linux": unterminated substitute in regular expression2 Replacing the nth occurrence of a pattern in a line. Use the /1, /2 etc flags to replace the first, second occurrence of a pattern in a line. The below command replaces the second occurrence of the word "unix" with "linux" in a line.
$ sed 's/unix/linux/2' file.txt
unix is great os. linux is opensource. unix is free os.
learn operating system.
unixlinux which one you choose.Here is the first occurence which is the default option
$ sed 's/unix/linux/1' file.txt
linux is great os. unix is opensource. unix is free os.
learn operating system.
linuxlinux which one you choose.And the third occurence
$ sed 's/unix/linux/3' file.txt
unix is great os. unix is opensource. linux is free os.
learn operating system.
unixlinux which one you choose.To replace all the occurence use 'g' (global replacement)
$ sed 's/unix/linux/g' file.txt
linux is great os. linux is opensource. linux is free os.
learn operating system.
linuxlinux which one you choose.To make the search case insensitive sed on mac does not have a flag but you can use plain regex to achieve it. For example modify the file.txt to below
$ vi file.txt
unix is great os. Unix is opensource. unix is free os.
learn operating system.
Unixlinux which one you choose.
sed 's/[Uu]nix/linux/g' file.txt
linux is great os. linux is opensource. linux is free os.
learn operating system.
linuxlinux which one you choose.How to find a string in all the files contained in a directory. You could use grep or find.
grep -lr searchStr mydir
grep --recursive --ignore-case --files-with-matches “searchStr" mydir
find mydir -type f | xargs grep -l searchStrTo find/replace multiple strings use the -e flag.
sed -e 's/unix/linux/g' -e 's/Unix/Linux/g' file.txt
linux is great os. Linux is opensource. linux is free os.
learn operating system.
Linuxlinux which one you choose.To replace a string that begins with a pattern use the regex for it alongwith sed
sed 's/^learn/learn to use/g' file.txt
unix is great os. Unix is opensource. unix is free os.
learn to use operating system.
Unixlinux which one you chooseTo remove whitespace characters at end of the line
sed 's/[<spc><tab>]*|/|/g' file.txtUnix command to know if your file has whitespace or tab characters
vi file.txt
:set listUnix command to remove BOM (Byte Order Mark) characters from your file Open the file in binary mode using -b flag to verify if you have BOM. And then remove them
vi -b file.txt
:set nobomb
:wqUse the -i flag to overwrite the existing file and create a backup of the original file. For example to remove all white spaces in a file.
sed 's/ //g' file.txt
cat file.txt
unixisgreatos.Unixisopensource.unixisfreeos.
learnoperatingsystem.
UnixlinuxwhichoneyouchooseThis will create a backup file called file.txt.bak with the original file contents and overwrite file.txt with no spaces To remove only the trailing spaces in a line use *$. The * character means "any number of the previous character" and $ refers to end of line.
sed -i .bak 's/ *$//g' file.txtVerify the trailing whitespaces are removed by :set list
vi file.txt
:set list
unix is great os. Unix is opensource. unix is free os.$
learn operating system.$
Unixlinux which one you choose.$To remove whitespaces between xml tags only.
sed -i .bak -e 's/> *</></g' file.xml
To replace a blank line with something else. You can match a blank line by specifying an end-of-line immediately after a beginning-of-line, i.e. with ^$
vi file.txt
unix is great os. Unix is opensource. unix is free os.
learn operating system.
Unixlinux which one you choose.
sed 's/^$/this used to be a blank line/' file.txt
unix is great os. Unix is opensource. unix is free os.
this used to be a blank line
learn operating system.
Unixlinux which one you choose.To remove tabs at the end of a line. Ex: Add a tab to the end of first line, so :set list will show ^I
vi file.txt
unix is great os. Unix is opensource. unix is free os.^I$
learn operating system.$
Unixlinux which one you choose.$To create a tab in your sed command. use ctrl + v and then ctrl + i
sed -i.bak 's/ *$//' file.txt
vi file.txt
:set list
unix is great os. Unix is opensource. unix is free os.$
learn operating system.$
Unixlinux which one you choose.$Consider file test which contains the following
$ cat test
(firstname).aa
(firstname).bb
(firstname).bb
(firstname).cc
(firstname).CC
(lastname).hh
(lastname).jj
(lastname).llTo extract the content after firstname
sed -En 's/.*firstname\)\.([A-Za-z]+).*/\1/p' test
aa
bb
bb
cc
CCTo extract everything before some content
sed -En 's/(.*)somecontent/\1/p' > output.fileor
sed 's/somecontent.*//'To split by separator '_' and take the first part
awk -F '_' '{print $1}' file.txtTo add a comma after every word (space separated) in a file
sed -i.bak 's/ /, /g' file.txtTo add a comma at the end of every line in a text file
sed -i'.bak' 's/$/,/g' file.txtTo remove last comma from each line on file
sed -i.bak 's/,$//' FileTo remove all double quotes in a file
sed -i'.bak' 's/\"//g' file.txtTo remove all single quotes in a file
sed -i'.bak' "s/'//g" file.txtTo remove everything after first comma in lines of file
awk -F ',' '{print $1}' file.txt > file_temp.txt && mv file_temp.txt file.txtor with sed
sed -i.bak 's/,.*$//' file.txt && rm file.txt.bakTo extract everything between first and second comma in a file
awk -F ',' '{print $2}' file.txtTo add a character at beginning of every line in a file
sed -i.bak 's/^/prefix/' file.txtTo add quotes around first word of every line. Here , is the delimiter between words. $1 represents first word is to be selected. & is the content of first word. sub is a substitute function. See here for more details https://superuser.com/questions/664125/unix-surround-first-column-of-csv-with-double-quotes
awk -F, '{sub($1, "\"&\""); print}' file.txtTo copy records from a large file containing a string 'FOO' and adding those records with 'FOO' replaced by 'BAR'. Example:
cat fileA.txt
aaaa
bbb
ccccFOO
ddddFOOFirst create another file with BAR records and then merge the two files keeping unique.
sed -i.bak 's/FOO/BAR/gi' fileA.txtThis creates a fileA.txt.bak
cat fileA.txt.bak
aaaa
bbb
ccccBAR
ddddBARTo verify the correct number of records exists and have been copied. You can use following commands
grep -c 'FOO' fileA.txt
grep -c 'BAR' fileA.txt.bakAlso to get the num lines of each file
wc -l fileA.txt
wc -l fileA.txt.bakNow merge the two files keeping only unique records.
sort -u fileA.txt fileA.txt.bak > fileA.txt_o | mv fileA.txt_o fileA.txtNow fileA.txt should have everything. You can use the grep -c and wc -l to verify this file.
cat fileA.txt
aaaa
bbb
ccccBAR
ccccFOO
ddddBAR
ddddFOOSearch Strings
Total occurences of searchStr in current directory
grep -ro searchStr . | wc -l | xargs echo "Total matches :"Total number of files where searchStr occurs in current directory
grep -lor searchStr . | wc -l | xargs echo "Total matches :"To get an exact word match use the -w flag.
grep -lwr searchStr mydirRecursively replace string original with replacement in all files under OSx directory mydir recursively(Excludes hidden files and folders)
find mydir \( ! -regex '.*/\..*' \) -type f -exec sed -i '' 's/original/replacement/g' {} \;OR
find mydir \( ! -regex '.*/\..*' \) -type f -exec sed -i '' 's/original/replacement/g' {} +The regex excludes all hidden files and folders which is particularly important if you want to avoid messing up your .DS_Store or .git files unknowningly. if you use zsh then the following would also work
sed -i -- 's/original/replacement/g' **/*(D*)This isnt exlcuding hidden files though. The **/(D) is basically zsh way of saying recursively go through all sub directories and all files.
Delete all files of a certain type under current directory
find . -name "*.pyc" -exec rm -f {} \;Replace a string with another string in all files under current directory
find . -name '*.sh' -exec sed -i 's/foo/bar/g' {} \;or
find <path-to-directory> -type f -print0 | xargs -0 sed -i 's/foo/bar/g'Remove everthing after first space in line. (Or extract first word from line)
awk '{ print $1 }' < input > outputVi see line numbers
:set numbersed -n '105830,106694p;106695q' logile > outputstarting line number: 105830,
ending line number: 106694