Processing & Reading Multiple Files Simultaneously Using fileinput In Python

Sachin Pal
8 min readJul 26, 2023

--

The fileinput module is a part of the standard library and is used when someone needs to iterate the contents of multiple files simultaneously. Well, Python's in-built open() function can also be used for iterating the content but for only one file at a time.

You’ll explore the classes and functions provided by the fileinput module to iterate over multiple files.

But one thing, you could use fileinput to iterate the single file also, but it would be better to use the open() function for it.

Basic Usage

import fileinput

# Creating fileinput instance and passing multiple files
stream = fileinput.input(files=('test.txt',
'sample.txt',
'index.html'))

# Iterating the content
for data in stream:
print(data)

The fileinput module was first imported, and then the fileinput instance was created by calling fileinput.input() and passing the tuple of files ( test.txt, sample.txt, and index.html). This will result in the return of an iterator.

The contents of the files were then iterated and printed using the for loop.

Hi, I am a test file.
Hi, I am a sample file for testing.
<html lang="en">
<head>
<title>Test HTML File</title>
</head>
<body>
<h1>Hi, I am a simple HTML File.</h1>
</body>
</html>

Another approach would be to use the fileinput module as a context manager. This method is somewhat safe because it ensures that the fileinput instance is closed even if an exception occurs.

import fileinput

with fileinput.input(files=('test.txt', 'sample.txt')) as files:
for data in files:
print(data)

In the above demonstration, the fileinput module was used as a context manager with the 'with' statement.

The above code will return an iterator and will assign it to the files variable (due to the as clause) then the data will be iterated using the files variable.

Hi, I am a test file.
Hi, I am a sample file for testing.

The fileinput.input() Function

The fileinput.input() function is the primary interface of the fileinput module, by using it, the purpose of using the fileinput module is nearly fulfilled. You saw a glimpse of the fileinput.input() function in the previous section, this time, you'll learn more about it.

Syntax

fileinput.input(files=None, inplace=False, backup='', mode='r', openhook=None, encoding=None, errors=None)

Parameters:

files: Defaults to None. Takes a single file or multiple files to be processed.

inplace: Defaults to False. When set to True, the files can be modified directly.

backup: Defaults to an empty string. The extension is specified for the backup files when inplace is set to True.

mode: Default to read mode. This can only open files in read mode hence, we can open the file in r, rb, rU, and U.

openhook: Defaults to None. A custom function for controlling how files are opened.

encoding: Defaults to None. Specifies the encoding to be used to read the files.

errors: Defaults to None. Specifies how the errors should be handled.

Modifying the Files Before Reading

import fileinput

with fileinput.input(files=('test.txt', 'sample.txt'), inplace=True) as files:
for data in files:
modified_content = data.lower()
print(modified_content)

The parameter inplace is set to True in the above code, which enables the editing of the file before reading.

The upper code will lowercase the content present inside both files ( test.txt and sample.txt).

Storing Backup of Files

When the inplace parameter is set to True, the original files can be edited, but the original state of the files can be saved in another file using the backup parameter.

import fileinput

with fileinput.input(files=('test.txt', 'sample.txt'),
inplace=True, backup='.bak') as files:
for data in files:
modified_content = data.capitalize()
print(modified_content)

The above code will capitalize the content and the original files will be saved as test.txt.bak and sample.txt.bak due to the backup='.bak'.

Controlling the Opening of the File

import fileinput

def custom_open(filename, mode):
data = open(filename, "a+")
data.write(" Data added through function.")
return open(filename, mode)

with fileinput.input(files=("test.txt", "sample.txt"), openhook=custom_open) as file:
for data in file:
print(data)

The custom_open() function is defined that takes two parameters filename and mode. The function opens the file in append + read mode and then writes the string and returns the file object.

The hook must be a function that takes two arguments, filename and mode, and returns an accordingly opened file-like object. Source

The files are then passed to the fileinput.input() function, and the openhook parameter is set to custom_open. The custom_open() function will be in charge of opening the files. The file content was iterated and printed.

Hi, i am a test file. Data added through function.
Hi, i am a sample file for testing. Data added through function.

Reading Unicode Characters

You have a file having Unicode characters and need to read that file, to read Unicode characters, specific encodings are used.

with fileinput.input(files=('test_unicode.txt'), encoding='utf-8') as files:
for data in files:
print(data)

The UTF-8 encoding can be used to read the Unicode characters, hence, the encoding parameter is set to utf-8 encoding.

😁😂😅

Handling Errors

To handle the error, use the errors parameter. Take the above code as an example: if the encoding was not specified, the code would throw a UnicodeError.

with fileinput.input(files=('test_bin.txt'), errors='ignore') as files:
for data in files:
print(data)

----------
ðŸ˜ðŸ˜‚😅

The errors parameter is set to ignore, which means that the error will be ignored. The errors parameter can also be set to strict (raise an exception if an error occurs) or replace (replace an error with a specified error).

Functions to Access Input File Information

There are some functions that can be used to access the information of the input files which are being processed using the fileinput.input() function.

Getting the File Names

Using the fileinput.filename() function, the name of the currently processed files can be displayed.

with fileinput.input(files=('test.txt', 'sample.txt')) as files:
for data in files:
print(f"File: {fileinput.filename()}")
print(data)

Output

File: test.txt
Hi, i am a test file. Data added through function. Added data to the file.
File: sample.txt
Hi, i am a sample file for testing. Data added through function. Added data to the file.

Getting the File Descriptor and Line and File Line Number

The fileinput.fileno() function returns the active file's file descriptor, the fileinput.lineno() function returns the cumulative line number, and the fileinput.filelineno() function returns the line number of the currently processed file.

with fileinput.input(files=('test.txt', 'sample.txt')) as files:
for data in files:
print(f"{fileinput.filename()}'s File Descriptor: {fileinput.fileno()}")
print(f"{fileinput.filename()}'s File Line Number: {fileinput.filelineno()}")
print(f"{fileinput.filename()}'s File Cumulative Line No.: {fileinput.lineno()}")

Output

test.txt's File Descriptor: 3
test.txt's File Line Number: 1
test.txt's File Cumulative Line No.: 1

sample.txt's File Descriptor: 3
sample.txt's File Line Number: 1
sample.txt's File Cumulative Line No.: 2

Checking Reading Status

with fileinput.input(files=('test.txt', 'sample.txt')) as files:
for data in files:
print(f"Read First Line: {fileinput.isfirstline()}")
print(f"Last Line Read From sys.stdin: {fileinput.isstdin()}")

----------
Read First Line: True
Last Line Read From sys.stdin: False
Read First Line: True
Last Line Read From sys.stdin: False

The fileinput.isfirstline() function returns True if the line read from the current file is the first line otherwise returns False, since both files contain a single line, it returned True.

When the last line of the input file was read from sys.stdin, the fileinput.isstdin() function returns True, otherwise, it returns False.

Closing the File

When using fileinput.input() function as the context manager with the with statement, the file closes anyway but fileinput.close() function is also used to close the resources when the work is done.

import fileinput

with fileinput.input(files=('test.txt', 'sample.txt')) as file:
for data in file:
if data > data[:26]:
fileinput.close()
print('File has more than 25 characters.')
else:
print(data)

The above code demonstrates the use of the fileinput.close() function, which closes the file if it contains more than 25 characters and prints a message otherwise the content is printed.

File has more than 25 characters.

However, because the file contained more than 25 characters, the file was closed and the message was printed.

The FileInput Class

The fileinput.FileInput class is an object-oriented alternative to the fileinput.input() function. The parameters are identical to those of the input() function.

Syntax

fileinput.FileInput(files=None, inplace=False, backup='', mode='r', openhook=None, encoding=None, errors=None)

Example

import fileinput

class OpenMultipleFiles:
def __init__(self, *args):
self.args = args

def custom_open(self, filename, mode):
data = open(filename, "a+")
data.write(" Added data to the file.")
return open(filename, mode)

def read(self):
with fileinput.FileInput(files=(self.args), openhook=OpenMultipleFiles().custom_open) as file:
for data in file:
print(data)

obj = OpenMultipleFiles('test.txt', 'sample.txt')
obj.read()

The class OpenMultipleFiles is defined in the above code. The class has an __init__ method that takes variadic arguments.

A custom_open method is defined within the class that opens the file in append+read mode, writes some data to the file, and returns the file object.

The read method is defined and within the read method the instance of the fileinput.FileInput is created and passed the self.args as the files argument and the openhook parameter is set to OpenMultipleFiles().custom_open. The contents of the files are then iterated and printed.

Finally, the OpenMultipleFiles class instance is created and passed the file names ( test.txt and sample.txt) and stored within the obj variable. The read method is then invoked on the obj to read the specified files.

Hi, i am a test file. Data added through function. Added data to the file.
Hi, i am a sample file for testing. Data added through function. Added data to the file.

Comparison

Let’s see how long it takes to process the contents of multiple files at the same time using the open() and the fileinput.input() function.

import timeit

# open() Function Code
code = '''
with open('test.txt') as f1, open('sample.txt') as f2:
f1.read()
f2.read()
'''

print(f"Open Function Benchmark: {timeit.timeit(stmt=code, number=1000)}")

# fileinput Code
setup = 'import fileinput'

code = '''
with fileinput.input(files=('test.txt', 'sample.txt')) as file:
for data in file:
data
'''
print(f"Fileinput Benchmark: {timeit.timeit(setup=setup, stmt=code, number=1000)}")

Using the timeit module, the above code measures the time it takes to process the contents of multiple files 1000 times for the fileinput.input() function and open() function. This method will aid in determining which is more efficient.

Open Function Benchmark: 0.3948998999840114
Fileinput Benchmark: 0.4962893000047188

Limitations

Every module is powerful in its own right, but it also has limitations, such as the fileinput module.

  • It does not read files, instead, it iterates through the contents of the file line by line and prints the results.
  • Cannot write or append the data into the files.
  • Cannot perform advanced file-handling operations.
  • Less performant because the program’s performance may suffer when processing large files.

Conclusion

Let’s recall what you’ve learned:

  • An overview of the fileinput module
  • Basic usage of the fileinput.input() with and without context manager
  • The fileinput.input() function and its parameters with examples
  • A glimpse of FileInput class
  • Comparison of fileinput.input() function with open() function for processing multiple files simultaneously
  • Some limitations of the fileinput module

That’s all for now

Keep coding✌✌

Originally published at https://geekpython.in.

--

--

Sachin Pal
Sachin Pal

Written by Sachin Pal

I am a self-taught Python developer who loves to write on Python Programming and quite obsessed with Machine Learning.

No responses yet