X Tutup
>你这论断人的,无论你是谁,也无可推诿,你在甚么事上论断人,就在甚么事上定自己的罪。因你这论断人的,自己所行却和别人一样。我们知道这样行的人,神必照真理审判他。(ROMANS 2:1-2) #文件(2) 文件,也是对象。 在Python 2中还是一种内建的类型,在Python 3中被取消了,可是,这并不意味这其地位降低。 ##文件的状态 很多时候,我们需要获取一个文件的有关状态(也称为属性),比如创建日期,访问日期,修改日期,大小,等等。在os模块中,有这样一个方法,专门让我们查看文件的这些状态参数的。 >>> import os >>> file_stat = os.stat("131.txt") #查看这个文件的状态 >>> file_stat #文件状态是这样的。从下面的内容,有不少从英文单词中可以猜测出来。 posix.stat_result(st_mode=33204, st_ino=5772566L, st_dev=2049L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=69L, st_atime=1407897031, st_mtime=1407734600, st_ctime=1407734600) >>> file_stat.st_ctime #这个是文件创建时间 1407734600.0882277 这是什么时间?看不懂!别着急,换一种方式。在Python中,有一个模块`time`,是专门针对时间设计的。 >>> import time >>> time.localtime(file_stat.st_ctime) #这回看清楚了。 time.struct_time(tm_year=2014, tm_mon=8, tm_mday=11, tm_hour=13, tm_min=23, tm_sec=20, tm_wday=0, tm_yday=223, tm_isdst=0) ##read/readline/readlines 用`open()`能够打开文件,在用for循环,可以将文件的内容读取出来。 但是,在查看文件的属性和方法的时候,会看到三个方法:`read, readline, readlines`。 从名称上看,它们应该都是跟“读”有关的,但是,又应该有差别。 的确如此。 还是对比着看一看,并且还要Python 2和Python 3的文档对比。 Python 2: >>> help(file.read) Help on method_descriptor: read(...) read([size]) -> read at most size bytes, returned as a string. If the size argument is negative or omitted, read until EOF is reached. Notice that when in non-blocking mode, less data than what was requested may be returned, even if no size parameter was given. >>> help(file.readline) Help on method_descriptor: readline(...) readline([size]) -> next line from the file, as a string. Retain newline. A non-negative size argument limits the maximum number of bytes to return (an incomplete line may be returned then). Return an empty string at EOF. >>> help(file.readlines) Help on method_descriptor: readlines(...) readlines([size]) -> list of strings, each a line from the file. Call readline() repeatedly and return a list of the lines so read. The optional size argument, if given, is an approximate bound on the total number of bytes in the lines returned. Python 3: >>> f = open("130.txt") >>> help(f.read) Help on built-in function read: read(size=-1, /) method of _io.TextIOWrapper instance Read at most n characters from stream. Read from underlying buffer until we have n characters or we hit EOF. If n is negative or omitted, read until EOF. >>> help(f.readline) Help on built-in function readline: readline(size=-1, /) method of _io.TextIOWrapper instance Read until newline or EOF. Returns an empty string if EOF is hit immediately. >>> help(f.readlines) Help on built-in function readlines: readlines(hint=-1, /) method of _io.TextIOWrapper instance Return a list of lines from the stream. hint can be specified to control the number of lines read: no more lines will be read if the total size (in bytes/characters) of all lines so far exceeds hint. 对照一下上面的说明,三个的异同就显现了。而且,要想了解Python 2和Python 3之间的不同,还可以将两个版本的对照一下。好像也没有什么不同哦。 EOF什么意思?End-of-file。在[维基百科](http://en.wikipedia.org/wiki/End-of-file)中居然有对它的解释: In computing, End Of File (commonly abbreviated EOF[1]) is a condition in a computer operating system where no more data can be read from a data source. The data source is usually called a file or stream. In general, the EOF is either determined when the reader returns null as seen in Java's BufferedReader,[2] or sometimes people will manually insert an EOF character of their choosing to signal when the file has ended. 明白EOF之后,就对比一下: - read:如果指定了参数size,就按照该指定长度从文件中读取内容,否则,就读取全文。被读出来的内容,全部塞到一个字符串里面。这样有好处,就是东西都到内存里面了,随时取用,比较快捷;“成也萧何败萧何”,也是因为这点,如果文件内容太多了,内存会吃不消的。文档中已经提醒注意在“non-blocking”模式下的问题,关于这个问题,不是本节的重点,暂时不讨论。 - readline:那个可选参数size的含义同上。它则是以行为单位返回字符串,也就是每次读一行,依次循环,如果不限定size,直到最后一个返回的是空字符串,意味着到文件末尾了(EOF)。 - readlines:size同上。它返回的是以行为单位的列表,即相当于先执行`readline()`,得到每一行,然后把这一行的字符串作为列表中的元素塞到一个列表中,最后将此列表返回。 依次演示操作,即可明了。有这样一个文档,名曰:you.md,其内容和基本格式如下: You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be. 分别用上述三种函数读取这个文件。 >>> f = open("you.md") >>> content = f.read() >>> content 'You Raise Me Up\nWhen I am down and, oh my soul, so weary;\nWhen troubles come and my heart burdened be;\nThen, I am still and wait here in the silence,\nUntil you come and sit awhile with me.\nYou raise me up, so I can stand on mountains;\nYou raise me up, to walk on stormy seas;\nI am strong, when I am on your shoulders;\nYou raise me up: To more than I can be.\n' >>> f.close() **提示:养成一个好习惯,**只要打开文件,不用该文件了,就一定要随手关闭它。 >注意:在python中,'\n'表示换行,这也是UNIX系统中的规范。但是,在奇葩的windows中,用'\r\n'表示换行。Python在处理这个的时候,会自动将'\r\n'转换为'\n'。 请仔细观察,得到的就是一个大大的字符串,但是这个字符串里面包含着一些符号`\n`,因为原文中有换行符。如果用print输出这个字符串,就是这样的了,其中的`\n`起作用了。 >>> print content #Python 3: print(content) You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be. 用`readline()`读取,则是这样的: >>> f = open("you.md") >>> f.readline() 'You Raise Me Up\n' >>> f.readline() 'When I am down and, oh my soul, so weary;\n' >>> f.readline() 'When troubles come and my heart burdened be;\n' >>> f.close() 显示出一行一行读取了,每操作一次`f.readline()`,就读取一行,并且将指针向下移动一行,如此循环。显然,这种是一种循环,或者说可迭代的。因此,就可以用循环语句来完成对全文的读取。 #!/usr/bin/env python # coding=utf-8 f = open("you.md") while True: line = f.readline() if not line: #到EOF,返回空字符串,则终止循环 break print line , #Python 3: print(line, end='') f.close() #别忘记关闭文件 将其和文件"you.md"保存在同一个目录中,我这里命名的文件名是12701.py,然后在该目录中运行`python 12701.py`,就看到下面的效果了: ~/Documents$ python 12701.py You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be. 也用`readlines()`来读取此文件: >>> f = open("you.md") >>> content = f.readlines() >>> content ['You Raise Me Up\n', 'When I am down and, oh my soul, so weary;\n', 'When troubles come and my heart burdened be;\n', 'Then, I am still and wait here in the silence,\n', 'Until you come and sit awhile with me.\n', 'You raise me up, so I can stand on mountains;\n', 'You raise me up, to walk on stormy seas;\n', 'I am strong, when I am on your shoulders;\n', 'You raise me up: To more than I can be.\n'] 返回的是一个列表,列表中每个元素都是一个字符串,每个字符串中的内容就是文件的一行文字,含行末的符号。显而易见,它是可以用for来循环的。 >>> for line in content: ... print line , #Python 3: print(line, end='') ... You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be. >>> f.close() ##读很大的文件 前面已经说明了,如果文件太大,就不能用`read()`或者`readlines()`一次性将全部内容读入内存,可以使用while循环和`readline()`来完成这个任务。 此外,还有一个方法:`fileinput`模块 >>> import fileinput >>> for line in fileinput.input("you.md"): ... print line , #Python 3: print(line, end='') ... You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be. 我比较喜欢这个,用起来是那么得心应手,简洁明快,还用for。 对于这个模块的更多内容,读者可以自己在交互模式下利用`dir()`,`help()`去查看明白。 还有一种方法,更为常用: >>> for line in f: ... print line , #Python 3: print(line, end='') ... You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be. 之所以能够如此,是因为文件是可迭代的对象,直接用for来迭代即可。 ##seek 这个函数的功能就是让指针移动。比如: >>> f = open("you.md") >>> f.readline() 'You Raise Me Up\n' >>> f.readline() 'When I am down and, oh my soul, so weary;\n' 现在来看`seek()`的能力: >>> f.seek(0) 意图是要回到文件的最开头,那么如果用`f.readline()`应该读取第一行。 >>> f.readline() 'You Raise Me Up\n' 果然如此。此时指针所在的位置,还可以用`tell()`来显示,如 >>> f.tell() 17L 这不是在低17行。在Python 2中返回的是`17L`,很容易让人产生如此误解。但是在Python 3中返回的是`17`,又可能迷茫,单位是什么呢? 读者不妨数一数,是按照字符,`'You Raise Me Up\n'`从0开始,到最后,是不是正好为`17`。提醒注意的是,如果你用汉语的等文字,就需要注意编码问题了。请查看本教程有关编码的讨论。 >>> f.seek(4) `f.seek(4)`就将位置定位到从开头算起的第四个字符后面,也就是"You "之后,字母"R"之前的位置。 >>> f.tell() 4L #Python 3返回:4 `tell()`也是这么说的。这时候如果使用`readline()`,得到就是从当前位置开始到行末。 >>> f.readline() 'Raise Me Up\n' >>> f.close() `seek()`还有别的参数,具体如下: >seek(...) > seek(offset[, whence]) -> None. Move to new file position. > > Argument offset is a byte count. Optional argument whence defaults to > 0 (offset from start of file, offset should be >= 0); other values are 1 > (move relative to current position, positive or negative), and 2 (move > relative to end of file, usually negative, although many platforms allow > seeking beyond the end of a file). If the file is opened in text mode, > only offsets returned by tell() are legal. Use of other offsets causes > undefined behavior. > Note that not all file objects are seekable. whence的值: - 默认值是0,表示从文件开头开始计算指针偏移的量(简称偏移量)。这是offset必须是大于等于0的整数。 - 是1时,表示从当前位置开始计算偏移量。offset如果是负数,表示从当前位置向前移动,整数表示向后移动。 - 是2时,表示相对文件末尾移动。 前面已经提到了,文件是可迭代的,并且还学过其它可迭代的对象,看来迭代是一个有必要讨论的问题。 ------ [总目录](./index.md)   |   [上节:文件(1)](./126.md)   |   [下节:迭代](./128.md) 如果你认为有必要打赏我,请通过支付宝:**qiwsir@126.com**,不胜感激。
X Tutup