Python文本文件内容操作案例精选

读写文件

示例9-1 向文本文件中写入内容，然后再读出。

s = 'Hello world\n文本文件的读取方法\n文本文件的写入方法\n'

with open('sample.txt', 'w') as fp:    #默认使用cp936编码
    fp.write(s)

with open('sample.txt') as fp:         #默认使用cp936编码
    print(fp.read())

示例9-2 将一个CP936编码格式的文本文件中的内容全部复制到另一个使用UTF8编码的文本文件中。

def fileCopy(src, dst, srcEncoding, dstEncoding):
    with open(src, 'r', encoding=srcEncoding) as srcfp:
        with open(dst, 'w', encoding=dstEncoding) as dstfp:
            dstfp.write(srcfp.read())

fileCopy('sample.txt', 'sample_new.txt', 'cp936', 'utf8')

修改网页

示例9-10 修改HTML网页文件，使用iframe框架嵌入另一个HTML页面。

def infectHtml(fileName, infectedContent):
    with open(fileName, 'a+') as fp:
        fp.write(infectedContent)

content = '<iframe src="anotherHtml.html" height=50px width=200px></iframe>'
infectHtml('index.html', content)

实例：制作html电子书（自动生成含有图片的网页）

import os
import shutil    #导入shutil模块
                             

def infectHtml(fileName, infectedContent):
    with open(fileName, 'r') as fp:
        lines = fp.readlines()
    for index, line in enumerate(lines):
        if line.strip().lower().startswith('<body>'):
            lines.insert(index+1, infectedContent)
            break
    with open(fileName, 'w') as fp:
        fp.writelines(lines)


#s=input("pls input the dir:")


for s in os.listdir("."):#遍历当前目录的每一个子目录（电子书的每一章节)
    if os.path.isdir(s):#如果它是目录，则开始为它制作电子书
        #print(s)
        str1="<p><img src=\""+s+"\\"
        str2="\" width=\"600px\"/></p>"
        x=[]
        shutil.copyfile('index.html', s+".html")#新建网页空文件
        for fname in os.listdir(s):
            content=str1+fname+str2
            print(content)
            x.append(content)
        
        while len(x)!=0:
            info =x.pop()
            infectHtml(s+".html",info)

修改特定字符

示例9-4 假设已有一个文本文件sample.txt，将其中第13、14两个字符修改为测试。

with open('sample.txt', 'r+') as fp:
    fp.seek(13)
    fp.write('测试')

遍历文件所有行

示例9-3 遍历并输出文本文件的所有行内容。

with open('sample.txt') as fp:      #假设文件采用CP936编码
    for line in fp:                 #文件对象可以直接迭代
        print(line)

示例9-6 统计文本文件中最长行的长度和该行的内容。

with open('sample.txt') as fp:
    result = [0, '']
    for line in fp:
        t = len(line)
        if t > result[0]:
            result = [t, line]
print(result)

示例9-11 修改HTML网页文件，插入网页打开时能够自动运行的JavaScript脚本。

def infectHtml(fileName, infectedContent):
    with open(fileName, 'r') as fp:
        lines = fp.readlines()
    for index, line in enumerate(lines):
        if line.strip().lower().startswith('<html>'):
            lines.insert(index+1, infectedContent)
            break
    with open(fileName, 'w') as fp:
        fp.writelines(lines)

content = '<head><script>window.onload=function(){alert("test");}</script></head>'
infectHtml('index.html', content)

示例9-5 假设文件data.txt中有若干整数，所有整数之间使用英文逗号分隔，编写程序读取所有整数，将其按升序排序后再写入文本文件data_asc.txt中。

with open('data.txt', 'r') as fp:
    data = fp.readlines()                         #读取所有行
data = [line.strip() for line in data]            #删除每行两侧的空白字符
data = ','.join(data)                             #合并所有行
data = data.split(',')                            #分隔得到所有数字字符串
data = [int(item) for item in data]               #转换为数字
data.sort()                                       #升序排序
data = ','.join(map(str,data))                    #将结果转换为字符串
with open('data_asc.txt', 'w') as fp:             #将结果写入文件
    fp.write(data)

JSON数据存储

JSON: JavaScript Object Notation(JavaScript 对象表示法)

JSON 是存储和交换文本信息的语法。类似 XML。

JSON 比 XML 更小、更快，更易解析。

示例9-7 使用标准库json进行数据交换。

import json
dt={'a':1, 'b':2, 'c':3};
with open('test.txt', 'w') as fp:
    json.dump(dt, fp) #写入文件

with open('test.txt', 'r') as fp:
    print(json.load(fp))                 #从文件中读取

结果：

{'a': 1, 'b': 2, 'c': 3}

CSV数据存储

逗号分隔值（Comma-Separated Values，CSV，有时也称为字符分隔值，因为分隔字符也可以不是逗号），其文件以纯文本形式存储表格数据（数字和文本）。纯文本意味着该文件是一个字符序列，不含必须像二进制数字那样被解读的数据。CSV文件由任意数目的记录组成，记录间以某种换行符分隔；每条记录由字段组成，字段间的分隔符是其它字符或字符串，最常见的是逗号或制表符。通常，所有记录都有完全相同的字段序列。通常都是纯文本文件。建议使用WORDPAD或是记事本（NOTE）来开启，再则先另存新档后用EXCEL开启，也是方法之一。

>>> import csv
>>> with open('test.csv', 'w', newline='') as fp:
    test_writer = csv.writer(fp, delimiter=',', quotechar='"')
    test_writer.writerow(['red', 'blue', 'green'])  #写入一行内容
    test_writer.writerow(['test_string']*5)

>>> with open('test.csv', newline='') as fp:
    test_reader = csv.reader(fp, delimiter=' ', quotechar='"')
    for row in test_reader:                         #遍历所有行
        print(row)                                  #每行作为一个列表返回
['red', 'blue', 'green']
['test_string', 'test_string', 'test_string', 'test_string', 'test_string']

文件遍历

示例9-9 编写程序，统计指定目录所有C++源程序文件中不重复代码行数。

from os.path import isdir, join
from os import listdir

NotRepeatedLines = []                         #保存非重复的代码行
file_num = 0                                  #文件数量
code_num = 0                                  #代码总行数

def LinesCount(directory):
    global NotRepeatedLines, file_num, code_num

    for filename in listdir(directory):
        temp = join(directory, filename)
        if isdir(temp):                                #递归遍历子文件夹
            LinesCount(temp)
        elif temp.endswith('.cpp'):                    #只考虑.cpp文件
            file_num += 1
            with open(temp, 'r') as fp:
                for line in fp:
                    line = line.strip()                #删除两端的空白字符
                    if line not in NotRepeatedLines:
                        NotRepeatedLines.append(line)  #记录非重复行
                    code_num += 1                      #记录所有代码行

path = r'C:\Users\Dong\Desktop\VC++6.0'
print('总行数：{0}，非重复行数：{1}'.format(code_num,
                                          len(NotRepeatedLines)))
print('文件数量：{0}'.format(file_num))

返回文件和数据格式化（输入与输出）