Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Dealing with file names containing spaces in a bash script
#1
When using a for File in $(find command) statement any file name with space becomes problematic.  Where the space resides becomes individual arguments to the for loop.  Alternately, the find in the for File in "$(find command)" quoted will only produce one large argument of all files.  

The bash shell uses the (IFS) Internal Field  Separator for word splitting after expansion and to split lines  into  words  with  the  read builtin  command.   The  default  value  is  ``<space><tab><new-line>''.  So, any file name with a space(s) will be split in to multiple word(s).  Changing the IFS to a new-line only will remedy the situation.  Please observe the following code and run time output.


Code:
$ cat files_space_find


# Change the IFS variable then source this script.

function Find_files()
{

    for File in $(find * -prune -type f -print)
    do
        echo ":${File}:"
    done

}

# main

Find_files

#
### End of script


Notice, the script output uses a colon at both ends of the file name output to help discern the file name before and after spaces.


$ ls -1b

space_before between_after
files_space_find

$ ls | od -c
0000000        s   p   a   c   e   _   b   e   f   o   r   e       b   e
0000020    t   w   e   e   n   _   a   f   t   e   r      \n   f   i   l
0000040    e   s   _   s   p   a   c   e   _   f   i   n   d  \n        
0000056


With the default IFS variable notice how the file name at the space location is split in to words during the for loop execution.


$ echo "$IFS" | od -c
0000000       \t  \n  \n                                                
0000004

$ . files_space_find

:space_before:
:between_after:
:files_space_find:


With the IFS variable only containing a new-line the file name with spaces will no longer be split in to words.


$ IFS='
> '

$ echo "$IFS" | od  -c
0000000   \n  \n                                                                                                                
0000002


$ . ./files_space_find
: space_before between_after :
:files_space_find:

$ IFS="$IFS_Saved"

$ echo "$IFS" | od -c
0000000       \t  \n  \n                                                
0000004

$


Be careful modifying the IFS variable, because other utilities or commands may rely on the default IFS value.  Always return the IFS variable back to the default value after your for loop code.
Idea Give a person a fish, and you feed them for a day. Teach a person how to fish, and you feed them for a lifetime. ✝️ Proverbs 4:7 Wisdom is the principal thing; therefore get wisdom: and with all thy getting get understanding.  (Linux Mint 19 XFCE)
Reply
#2
if there are spaces in a filename, you put a backslash before the space. there are other characters this can be done for that would get misinterpreted by bash, which is what happens to the space. basically by putting the backslash before the space, you are telling bash that "the following character is part of the file name"... this is better known as "escape characters", and what it means is if there is a character that the shell might interpret as part of a command (space, asterisk, quote, dollar sign, hash, etc...) and you want it interpreted as part of a filename or a piece of text, you precede with a backslash. the name "escape characters" comes from the use of "escape sequences" in ANSI terminals and printer formatting commands, which were always begun with the [esc] character (decimal 27, 0x1B ).

for instance:
cat i\ have\ spaces\ in\ my\ filename.txt would list the contents of the file "i have spaces in my filename.txt"
Reply
#3
Regarding the use of the find command, another way for dealing with those pesky spaces in file names would be

Code:
find $DIR -type f -print0 -exec ...

or

Code:
find $DIR -type f -print0 | xargs -0 ...

This way there is no need to modify the IFS variable. See 'man find' for further explanations.

BTW, the big round thing is a zero, not an O. ;-)

P.S.: I'm pretty sure, there are cases, where deck_luck's method is the better approach. I just wanted to point out an alternative. :-D
Reply
#4
(03-19-2019, 03:57 AM)unclejed613 Wrote: if there are spaces in a filename, you put a backslash before the space.  there are other characters this can be done for that would get misinterpreted by bash, which is what happens to the space.  basically by putting the backslash before the space, you are telling bash that "the following character is part of the file name"... this is better known as "escape characters", and what it means is if there is a character that the shell might interpret as part of a command (space, asterisk, quote, dollar sign, hash, etc...) and you want it interpreted as part of a filename or a piece of text, you precede with a backslash.  the name "escape characters" comes from the use of "escape sequences" in ANSI terminals and printer formatting commands, which were always begun with the [esc] character (decimal 27, 0x1B ).  

for instance:
cat i\ have\ spaces\ in\ my\ filename.txt  would list the contents of the file "i have spaces in my filename.txt"


If you are using the interactive bash command line and dealing with a file as an argument to a command (not like my original post inside a bash script using for  in list statement) you can simply prevent word splitting by surrounding the file name with quotes.  You do not need to use individual escapes which might lead to typos and be tedious.   Try the quoting, I think you will like it better. 
 

Code:
$ ls -l 'what is the difference between variable post-increment:decrement and pre-increment decrement in bash.pdf'

-rw-r--r--@ 1 webuser  staff  150882 Mar 20 00:57 what is the difference between variable post-increment:decrement and pre-increment decrement in bash.pdf

$



In the above example using two quotes is much easier than using ten escapes.  When printing web pages to pdf files, it is common to have many spaces as well as special characters embedded in the resulting file name.  The quotes will prevent word splitting and ignore bash reserved special characters as opposed to double quotes which only prevents word splitting. In this case, I would recommend only using single quotes.



(03-19-2019, 01:45 PM)radolkin Wrote: Regarding the use of the find command, another way for dealing with those pesky spaces in file names would be

Code:
find $DIR -type f -print0 -exec ...

or

Code:
find $DIR -type f -print0 | xargs -0 ...

This way there is no need to modify the IFS variable. See 'man find' for further explanations.

BTW, the big round thing is a zero, not an O. ;-)

P.S.: I'm pretty sure, there are cases, where deck_luck's method is the better approach. I just wanted to point out an alternative. :-D



Sorry, I should have provided an appropriate context as a preface to my first post.  I only listed excerpts from my script that looks for a certain pattern in a file in a specific directory, applies a changes to the file data, then moves the file to a specific directory based on the change applied to the file. The script uses this process for every file in the directory. (maybe tmi :-) The "xargs" can be appropriate for taking the std out from a command and applying the output as argument(s) to another command.   However, in my script I am using many different commands as well as case statement to determine how to process the file list being generated by the "find .. in .. list" statement.



Idea Give a person a fish, and you feed them for a day. Teach a person how to fish, and you feed them for a lifetime. ✝️ Proverbs 4:7 Wisdom is the principal thing; therefore get wisdom: and with all thy getting get understanding.  (Linux Mint 19 XFCE)
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)