Python流式读取大文件的两种方法-图灵python

Python流式读取大文件的两种方法

1、使用 read 方法分块读取

使用更底层的file.read()方法不同于直接循环迭代文件对象，每次调用file.read(chunk_size)它将直接返回到当前位置并向后读取 chunk_size 没有必要等待任何换行符出现大小的文件内容。

defcount_nine_v2(fname):
"""计算文件中有多少数字'9'，每次读8kb
"""
count=0
block_size=1024*8
withopen(fname)asfp:
whileTrue:
chunk=fp.read(block_size)
#当文件没有更多内容时，read调用将返回空字符串''
ifnotchunk:
break
count+=chunk.count('9')
returncount

2、使用生成器解耦代码

可以定义新的chunked_file_reader生成器函数负责与“数据生成”相关的所有逻辑。

count_nine_v3中的主循环只需负责计数即可。

defchunked_file_reader(fp,block_size=1024*8):
"""生成器函数：分块读取文件内容
"""
whileTrue:
chunk=fp.read(block_size)
#当文件没有更多内容时，read调用将返回空字符串''
ifnotchunk:
break
yieldchunk


defcount_nine_v3(fname):
count=0
withopen(fname)asfp:
forchunkinchunked_file_reader(fp):
count+=chunk.count('9')
returncount

使用 iter(callable,sentinel) 当调用它时，它将返回一个特殊的对象，迭代它将继续产生可调用对象 callable 调用结果，直到结果是 setinel 时间，迭代终止。

defchunked_file_reader(file,block_size=1024*8):
"""生成器函数：分块读取文件内容，使用iter函数
"""
#首先，使用partiall(fp.read,block_size)构建一个没有参数的新函数
#循环将继续返回fp.read(block_size)调用结果直到为''时终止
forchunkiniter(partial(file.read,block_size),''):
yieldchunk

以上是Python流式读取大文件的两种方法，希望对大家有所帮助~