第10章 Python第三方库概览

Python第三方库的获取和安装

pip工具安装

最常用且最高效的python第三方安装工具是采用pip工具安装。pip是python官方提供并维护的在线第三方库安装工具。对于同时安装python2和python3环境的系统，建议采用pip3命令专门为Python3版本安装第3方库。
pip是python内置命令，需要通过命令行执行。执行pip -h将列出她常用的字命令。

C:\Users\thinkpad>pip3 -h

Usage:
  pip <command> [options]

Commands:
  install                     Install packages.
  download                    Download packages.
  uninstall                   Uninstall packages.
  freeze                      Output installed packages in requirements format.
  list                        List installed packages.
  show                        Show information about installed packages.
  check                       Verify installed packages have compatible dependencies.
  search                      Search PyPI for packages.
  wheel                       Build wheels from your requirements.
  hash                        Compute hashes of package archives.
  completion                  A helper command used for command completion.
  help                        Show help for commands.

[windows下面安装Python和pip终极教程 http://blog.csdn.net/lengqi0101/article/details/61921399]

[pip安装python库总是下载超时，有什么解决方法吗？ http://segmentfault.com/q/1010000000162410]

安装库

安装一个库的命令，格式如下：

pip3 install <拟安装库名>

例子：

C:\Users\thinkpad>pip3 install pygame
Collecting pygame
  Downloading pygame-1.9.3-cp36-cp36m-win32.whl (4.0MB)
    5% |█▊                              | 215kB 28kB/s eta 0:02:11

如果系统中安装了多个python版本，可使用下列命令指定安装的python版本,其中的参数t的值可以使用import sys;print(sys.path)来获取

pip3 install -t C:\\Users\\thinkpad\\AppData\\Local\\Programs\\Python\\Python37\\lib\\site-packages xlrd

查看已安装的库

pip3 list

C:\Users\thinkpad>pip3 list
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
pip (9.0.1)
pygame (1.9.3)
setuptools (28.8.0)

卸载已安装的库

pip3 uninstall pygame

 Proceed (y/n)? y
 Successfully uninstalled pygame-1.9.3

自定义安装

文件安装

当目标机器处于离线环境，无法使用pip在线安装时，可从下列网站下载对应的第三方库，并拷贝至目标机上。

https://www.lfd.uci.edu/~gohlke/pythonlibs/

PyInstaller库概述

PyInstaller库是一个十分有用的第三方库，它能够将原文件变成直接可运行的可执行文件。

他需要在命令行下使用下面命令进行安装。

pip3 install PyInstaller

jieba库概述

jieba库是一个非常重要的第三方中文分词函数库，能够将一段中文文本分隔成中文词语的序列。安装：

pip3 install jieba

>>> import jieba

>>> jieba.lcut("山东商业职业技术学院")
Building prefix dict from the default dictionary ...
Dumping model to file cache C:\Users\thinkpad\AppData\Local\Temp\jieba.cache
Loading model cost 1.265 seconds.
Prefix dict has been built succesfully.
['山东', '商业职业', '技术', '学院']

jieba库与中文分词

jieba.lcut()是最常用的中文分词函数，用于精确模式。

>>> import jieba
>>> ls = jieba.lcut("全国计算机等级考试Python科目")
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\thinkpad\AppData\Local\Temp\jieba.cache
Loading model cost 0.991 seconds.
Prefix dict has been built succesfully.
>>> print(ls)
['全国', '计算机', '等级', '考试', 'Python', '科目']

jieba.lcut("",cut_all=True)用于全模式，即将字符串的所有分词可能均列出来。

>>> import jieba
>>> ls = jieba.lcut("全国计算机等级考试Python科目",cut_all=True)
>>> print(ls)
['全国', '国计', '计算', '计算机', '算机', '等级', '考试', 'Python', '科目']

jieba.lcut_for_search（）返回搜索引擎模式，该模式首先执行精确模式，然后再对其中长此进一步切分，获得最终结果。
>>> import jieba
>>> ls = jieba.lcut_for_search("全国计算机等级考试Python科目")
>>> print(ls)
['全国', '计算', '算机', '计算机', '等级', '考试', 'Python', '科目']

搜索引擎模式更倾向于寻找短词，与这种方式具有一定荣誉度，但容易度相比全模式较少。

jieba.add_word顾名思义，用于向词库中添加新的单词。

>>> import jieba
>>> jieba.add_word("Python科目")
>>> ls = jieba.lcut("全国计算机等级考试Python科目")
>>> print(ls)
['全国', '计算机', '等级', '考试', 'Python科目']

wordcloud库概述

数据展示的方式多种多样，对于文本来说，更加直观、带有一定艺术感的展示效果需求很大。对于这类需求，词云的展示方式深得人心。

wordcloud库是专门用于根据文本生成词云的Python第三方库，十分有用且有趣。

C:\Users\thinkpad>pip3 install wordcloud

wordcloud库的使用十分简单，以一个字符串为例：

>>> from wordcloud import WordCloud
>>> txt = 'I like python.I am learning python'
>>> wordcloud = WordCloud().generate(txt)
>>> wordcloud.to_file('d:\\testcloud.png')
<wordcloud.wordcloud.WordCloud object at 0x000001F62C1E2908>
>>>

其中产生词云只需要一行语句，在第3行，并可以将词云保存为图片。

wordcloud库与可视化词云

在生成词云时，wordcloud默认会以空格或标点为分隔符对目标文本进行分词处理。对于中文文本，分词处理需要由用户来完成。一般步骤是先将文本分词处理，然后以空格拼接，再调用wordcloud库函数。

实例解析：《红楼梦》出场人物词云

初始版本

用jieba词频统计一下：

# CalStoryOfStone.py
import jieba
f = open("红楼梦.txt", "r", encoding="utf-8")
txt = f.read()
f.close()
words  = jieba.lcut(txt)
counts = {}
for word in words:
    if len(word) == 1:  #排除单个字符的分词结果
        continue
    else:
        counts[word] = counts.get(word,0) + 1
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True) 
for i in range(15):
    word, count = items[i]
    print ("{0:<10}{1:>5}".format(word, count))

改进版：去掉高频词汇

# CalStoryOfStone2.py
import jieba
excludes = {"什么","一个","我们","那里","你们","如今", \
            "说道","知道","老太太","起来","姑娘","这里", \
            "出来","他们","众人","自己","一面","太太", \
            "只见","怎么","奶奶","两个","没有","不是", \
            "不知","这个","听见"}
f = open("红楼梦.txt", "r", encoding="utf-8")
txt = f.read()
f.close()
words  = jieba.lcut(txt)
counts = {}
for word in words:
    if len(word) == 1:  #排除单个字符的分词结果
        continue
    else:
        counts[word] = counts.get(word,0) + 1
for word in excludes:
    del(counts[word])
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True) 
for i in range(5):
    word, count = items[i]
    print ("{0:<10}{1:>5}".format(word, count))

绘制出场词云

# CalStoryOfStone3.py
import jieba
from wordcloud import WordCloud

excludes = {"什么","一个","我们","那里","你们","如今", \
            "说道","知道","老太太","起来","姑娘","这里", \
            "出来","他们","众人","自己","一面","太太", \
            "只见","怎么","奶奶","两个","没有","不是", \
            "不知","这个","听见"}
f = open("红楼梦.txt", "r", encoding="utf-8")
txt = f.read()
f.close()
words  = jieba.lcut(txt)
newtxt = ' '.join(words)
wordcloud = WordCloud(background_color="white", \
                          width=800, \
                          height=600, \
                          font_path="msyh.ttc", \
                          max_words=200, \
                          max_font_size=80, \
                          stopwords = excludes, \
                          ).generate(newtxt)
wordcloud.to_file('红楼梦基本词云.png')