2022.01.19 02:00

Sed replace files in place

While this low-level OS caching is certainly something to be aware of, for the purposes of the following discussion it is entirely transparent and irrelevant to the points made, so it will be ignored here. Strictly speaking, "in-place" would really mean that: literally editing the very same file the same inode. This can be done in principle, but:. None of the usual tools or editors do this; even when they seem to do so, they actually create a temporary file behind the scenes. Let's look at what sed and perl two tools which are often said to be able to do "in-place" editing do when the option -i is used.

Sed has the -i switch for "in-place" editing. It's a nonstandard extension, and as such not universally available. According to the documentation at least GNU sed's , what sed does when -i is specified is create a temporary file, send output to that file, and at the end, that file is renamed to the original name.

This can be verified with strace ; even without strace, a simple "ls -i" of the file before and after sed operates will show two different inode numbers. If you do use -i with sed, make sure you specify a backup extension to save a copy of the original file in case something goes wrong. Only after you're sure everything was changed correctly, you can delete the backup. The BSD sed used on Mac OS X as well does not accept -i without a backup extension, which is good, although it can be fooled by supplying an empty string eg -i "".

Perl , similar to sed, has a -i switch to edit "in-place". And like sed, it creates a temporary file. However, the way Perl creates the temporary file is different. Perl opens and immediately unlink s the original file, then opens a new file with the same name new file descriptor and inode , and sends output to this second file; at the end, the old file is closed and thus deleted because it was unlinked, and what's left is a changed file with the same name as the original.

This is more dangerous than sed, because if the process is interrupted halfway, the original file is lost whereas in sed it would still be available, even if no backup extension was specified. Thus, it's even more important to supply a backup extension to Perl's -i , which results in the original file being rename d rather than unlink ed.

This works because, well, it's cheating. It really involves two files: the outer file is not really deleted by the rm command, as it's still open by virtue of the outer input redirection. The inner output redirection then really writes to a different disk file, although the operating system allows you to use the same file name because it's no longer "officially" in use at that point.

When the whole thing completes, the original file which was surviving anonymously for the duration of the processing, feeding command 's standard input is finally deleted from disk. So, this kludge still needs the same additional disk space you'd need if you used a temporary file ie, roughly the size of the original file.

It basically replicates what Perl does with -i when no backup extension is supplied, including keeping the original file in the risky "open-but-deleted" state for the duration of the operation. So, if one must use this method, at least they should do.

But then, doing this is hardly different from using an explicit temporary file, so why not do that? And so So, generally speaking, to accomplish almost any editing task on a file, temporary files should be used.

Sure, if the file is big, creating a temporary file becomes more and more inefficient, and requires that an amount of available free space roughly the same size of the original file is available. Nonetheless, it's by far the only right and sane way to do the job. Modern machines should have no disk space problems. The general method to edit a file, assuming command is the command that edits the file, is something along these lines:.

That is to safeguard the original data in case something goes wrong. If preserving the original inode number and thus permissions and other metadata is a concern, there are various ways, here are two:.

These commands are slightly less efficient than the previous methods, as they do two passes over the files adding the cat in the first method and the cp in the second. In most cases, the general method works just fine and you don't need these latter methods. If you're concerned about the excruciating details of these operations, this page on pixelbeat lists many more methods to replace a file using temporary files, both preserving and not preserving the metadata, with a description of the pros and cons of each.

In any case, for our purposes the important thing to remember of these methods is that the old file stays around whether under its original name or a different one until the new one has been completely written , so errors can be detected and the old file rolled back. This makes them the preferred method for changing a file safely.

There are alternatives to the explicit temporary file, however they are somewhat inferior in the writer's opinion. On the upside, they have the advantage of generally preserving the inode and other file metadata. One such tool is sponge , from the moreutils package. Its use is very simple:. As the man page says, what sponge does is "reads standard input and writes it out to the specified file.

Unlike a shell redirect, sponge soaks up all its input before opening the output file. This allows for constructing pipelines that read from and write to the same file".

So, sponge accumulates output coming from command in memory or, when it grows too much, guess where? When the incoming stream is over, it opens file for writing and writes the new data into it if it had to use a temp file, it just rename s that to file which is more efficient, although this results in changing the file's inode. This keeps everything in memory; it can be extended to use a temporary file and, for that matter, it can likely be extended to also perform whatever job the filter that feeds its input does, but then we are leaving the domain of this article.

With this, one could do. These methods work, and they do edit the same file inode , however they have the disadvantage that if the amount of data is huge, there is a moderately long period of time while data is being written back to the file during which part of the data is only in memory, so if the system crashes it will be lost.

If the editing to be done is not too complex, another alternative is the good old ed editor. A peculiarity of ed is that it reads its editing commands , rather than the data, from standard input. For example, to prepend "XXX" to each line in the file, It can be used as follows:.

At least in most implementations, ed does create a temporary file, which it uses as support for the editing operations; when it is asked to save the changes, it writes them back to the original file.

The ed utility shall operate on a copy of the file it is editing; changes made to the copy shall have no effect on the file until a w write command is given. The copy of the text is called the buffer. So, it should be clear that ed presents the same shortcomings of the sponge-like methods; in particular, when it's requested to perform a write the " w " command , ed truncates the original file and writes the contents of the buffer into it. If the amount of data is huge, this means that there's a moderately long time window during which the file is in an inconsistent state, until ed has written back the whole data and no other copy of the original data exists.

Consider this if you're worried about unexpected things happening in the middle of the process. Having said all this, we still see that, for some mysterious reasons, people still try to do away with temporary files, and come up with "creative" solutions.

Here are some of them. They are all broken and must not be used for any reason. Obviously none of these work, because the file is truncated by the shell as soon as the last part of the pipeline is run for any practical purpose this means "immediately". But, after thinking a bit about that, something "clicks" in the mind of whoever is writing the code, which generally leads to the following "clever" hack:.

And that indeed appears to work. Except it's utterly wrong, and may bite you when you least expect it, with very bad consequences things that seem to work are almost always much worse and dangerous than things that patently don't, because they can give a false sense of security.

So, what's wrong with it? The idea behind the hack is "let's sleep 10 seconds, so the command can read the whole file and do its job before the file is truncated and the fresh stuff coming from the pipe can be written to it". Let's ignore the fact that 10 seconds may or may not be appropriate and the same goes for whatever value you choose to use. There's something much more seriously, fundamentally wrong there.

Let's see what happens if the file is even moderately big. The right hand side of the pipeline will not consume any data coming from the pipe for 10 seconds or however many seconds. This means that whatever command outputs, goes into the pipe and just sits there, at least until sleep is finished. But of course, a pipe cannot hold an infinite amount of data; rather, its size is usually fairly limited like some tens of kilobytes, although it's implementation-dependent.

Now, what happens if the output of command fills the pipe before sleep has finished? It happens that at some point a write performed by command will block. If command is like most programs, that means that command itself will block.

In particular, it will not read anything else from file. So it's entirely possible, especially if the input file is moderately large, and the output is accordingly large, that command will block without having read the input file fully. GNU sed? The manual shows no flag for this operation, which is one of the most useful ones in other sed implementations. Or install sed To use a standard compliant sed , which does not have -i , to do in-place editing, in such a way that permissions, file ownerships and some other metadata are not modified and hard links and symlinks are not broken:.

This creates a temporary copy of the file we want to edit, and then applies the sed command to this copy while redirecting the result to the original filename. Redirecting to the original filename will truncate and rewrite that file, and since the file is not first deleted and recreated, it will retain most of its metadata. This is not possible on AIX even with the sed tool installed. You do need to use a temp file like suggested by terdon in comments to the question:.

Define a variable and use a subshell to execute sed and redirect to a file. Sign up to join this community. The best answers are voted up and rise to the top.

Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Learn more.

Asked 7 years, 11 months ago. Active 2 years, 10 months ago. Viewed 10k times. Improve this question. If you don't see it in the man page, you can be pretty sure it doesn't exist. Use perl -pi instead. Please help us improve Stack Overflow. Take our short survey. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Collectives on Stack Overflow. Learn more. Find and replace with sed in directory and sub directories Ask Question.

Asked 10 years, 6 months ago. Active 9 months ago. Viewed k times. I run this command to find and replace all occurrences of 'apple' with 'orange' in all files in root of my site: find.

What is wrong with this command? Here are some lines of output of find. And the directory strucuture please. Hm your find is correct, works for me with subdirs.

How do you know it does not process subdirectories? Show 4 more comments. Active Oldest Votes.

nzenesovfo1987's Ownd

0コメント

1000 / 1000