The course includes 7 hour recording (200 slides + multiple B2B cases) in China.This introduction course is desgined for B2B marketers who are are new to China’s Digital marketing system.

Key Content

* Digital landscape and major players in China
* Website marketing with Baidu (and its ecosystem), including Search Engine Optimization (SEO) and Search Engine Marketing (SEM) tips and tricks.
* Native ads with Wechat (ads channel and format, integrating online and offline marketing in China)
* Social media marketing (Overview of Chinese Social Media Channels, content marketing with wechat/weibo/douyin, and advanced techniques)
* Multiple case studies in different B2B industries


I was recently working on a new project with many Excel files, which include field names, sample values for each database table. I thought it would be very useful to have a quick python code to merge all these together into one big file.

To showcase the code, I craeted three files: customer.xlsx, event.xlsx, and referrals.xlsx.

and the code

import os
import pandas as pd
cwd = os.path.abspath(‘’) ##leave blank if your code and files are in same folder
all_files=os.listdir(cwd)

writer=pd.ExcelWriter(‘data_model.xlsx’) ## craete a new file to be filled in later, you can call whatever you want.

for file in…


O’Reilly learning, formely known as Safari Books Online, is an online platform with vast array of technical content: over 40,000 books, video courses, live training, and other valuable materials. I’ve been using the service since my first job where my company purchased company-wide access. Last year I started my video series for it, making recommendation on new books.

The platform offers 7day free trial, but honestly the $49/month (or $499/year) plan is a bit too pricy for individual learner.

There’s indeed another way, and it’s through joining ACM: Association for Computing Machinery. If you work on the data analytics/mining…


注:这次推荐的两本新书都是19年3月份出版,所以还没有对应中文版,两本书均可在美国亚马逊平台购买到,或者通过safaribooksonline以及packt的在线阅读平台阅读,两本书都提供代码和数据在github上以供下载。

因为自己也是营销出身,所以我对讲授有关数据分析在营销场景落地的内容总是非常关注。一般来说,这类的内容产品(书,课程,培训)都是覆盖三个方面:业务理解(我们要解决的业务问题是什么,这个问题为什么重要),统计知识(能用什么方法去解决,为什么要讲这个特定的方法),以及技术工具(为什么要用这个工具来解决,它的输入输出是什么,如何最终在业务上落地)。理想状况下,我们应该重视程度是依次排列,

但实际上绝大部分的内容都会本末倒置。


6th episode of my bookclub continues on data visualization and Tableau


品品自带的模型模板

前面一直没有提到的是Rapidminer其实自带了很多常见的模型流程,这点上是比Alteryx要好的很多的。在打开Rapidminer的时候,就可以看到这些模型工作流模板:

  • 客户流失模型
  • 营销预测模型
  • 信用风控模型
  • 购物车分析等等

这些模板都非常值得研究

在进入到分类模型之前,我们可以看下这里举出的用户流失模型案例

这里面比较有意思的是中间numerical to bionominal的流程,因为在原始数据中并没有churn这个1/0变量


今天着实没空就翻了下炼数成金的Rapidminer课程资料,感觉还是设计的比较初级,有些PPT也许是对初学者有用的,摘录如下


前面提到了线性回归中类别变量是无法被处理的,在实际操作中,我们需要对类别变量做一定的加工,现在最流行的一种处理方式称为 one hot encoding。 one hot编码是将类别变量转换为机器学习算法易于利用的一种形式的过程。

具体的解释请参考

https://zhuanlan.zhihu.com/p/37471802

https://my.oschina.net/hjchhx/blog/1832603

我将one hot encoding放在了输入数据之后,具体的配置中可以看到系统已经只保留了类别型变量供你选择,为了避免让模型太复杂,我只选择了gender和marital status两个变量

Peng's Draft

10+ year analytics professional || Host of Peng’s Book Club@Youtube || Advocate data for better (personal and social) decisions

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store